The Shadow Drive — Decentralized Storage Optimized for Solana
GenesysGo’s Shadow Drive has been designed as the solution to Solana’s growing need for an on-chain ecosystem native storage solution. The following pages are intended to outline what Shadow Drive is, how it works, and why it works. These pages will do so by explaining things in practical terms as our goal is for a wide range of people to fully understand what is being built, rather than a select few. The hard-core academic types with a deep love for algorithms will likely find these pages to be largely disappointing.
Please note, this paper assumes some basic knowledge of GenesysGo, $SHDW, and the Solana architecture as a whole. This paper is, by design, not meant as an introductory resources to GenesysGo or Solana. Instead, this paper should be viewed as a resource for those considering GenesysGo’s Shadow Drive as a potential storage solution or for those attempting to vet the Shadow Drive’s long term viability.
If you aren’t familiar with Solana’s architecture then it is highly recommended you spend some time learning about how Solana validators store “Account State”, what “AccountsDB” is, and what goes into the creation of “on-chain accounts.” Please see the Solana Discord (discord.gg/solana) and check out the dev-resources channel if none of these things are familiar to you.
- What is Shadow Drive?
- Under the hood of Shadow Drive
- Solana On-Chain Events and Proof of Storage
- The Shadow Drive Smart Contract(s)
- The Shadow Drive Enhanced GenesysGo Network Architecture
The Shadow Drive is a new approach to on-chain storage and is built specifically for the Solana blockchain, while retaining the ability to become a polychain service in the future. Other web3 storage providers have attempted to integrate with Solana in the past and have had only marginal success. Shadow Drive solves this by integrating directly with the Proof of History consensus mechanism, passing on-chain events for consensus approval by the Solana validator network which prove the continued existence and integrity of the stored data.
The idea for Shadow Drive was founded on the premise that the Solana network is largely misunderstood in terms of what it was built to achieve. By understanding what it truly means to be the world’s most performant state machine you realize that Solana can be used to reach consensus and maintain state on nearly anything.
What is Shadow Drive?
Shadow Drive combines open source software-defined object storage, integrates it with the Solana blockchain’s Proof of History consensus mechanism, and then decentralizes it. This allows us to achieve extremely fast I/O speeds, massive scalability, and data integrity. At the same time, we retain the benefits of trustlessness, enhanced security, full transparency, and decentralization that blockchain technology brings to the table.
Ultimately, Shadow Drive’s final iteration will be a permissionless, trustless, decentralized distributed storage network that exists into perpetuity without needing to rely on the direct efforts of a centralized team to grow and expand. It will take time, as the path for open sourcing hardware infrastructure is different than software, but Shadow Drive’s final form will be one in which the community drives enhancements, the direction, and future of Shadow Drive.
Under the hood of the Shadow Drive
At the heart of Shadow Drive lives the open source software defined storage program called Ceph. https://en.wikipedia.org/wiki/Ceph_(software)
Ceph was chosen for a number of reasons…
- It is VERY open source. Ceph was first presented in 2006 and merged directly into the Linux kernel in 2010. Since then the Ceph GitHub has grown to 179 different repositories. These different repositories have been collectively forked over 10,000 times, had thousands of PRs submitted, and has seen a community of tens of thousands emerge to provide support. https://github.com/ceph
- It is extremely resilient and adaptable. Ceph is designed to not have a singular point of failure that could lead to data loss. As Shadow Drive is being designed to run in a permissionless trustless decentralized environment, having no singular point of failure is very attractive. The resiliency of how Ceph stores data and its open source design means that Ceph can be forked and modified to be a trustless permissionless decentralized storage layer that can be integrated with smart contracts to protect the stored data against bad actors.
- Ceph is very performant and scales exceptionally well both horizontally and vertically. Our decentralized cluster consistently handled 2,000 concurrent connections each uploading 10,000 individual objects measuring 2mb in size and sustained an upload speed of 2.7gbps with zero packet loss for extended periods of time. This means that the cluster is so fast that when Solana validators finish block #130188099 we can ingest it, store it, and serve live requests against it before block #130188100 is finished and propagated.
- Ceph’s CRUSH map algorithm is amazing! CRUSH is a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. The CRUSH whitepaper (https://ceph.com/assets/pdfs/weil-crush-sc06.pdf) dives deep into the algorithm but the TL;DR is that CRUSH allows for the decentralization of location for data on an individual byte level. Ceph utilizes CRUSH to literally break stored objects down into component bytes, shards/erasure codes those bytes, and then decentralizes their location in triplicate across any particular Ceph cluster.
- Speaking of decentralization of data… Ceph runs its own consensus mechanism internally to ensure the integrity of your data. Monitor daemons are the custodians of the pieces of the CRUSH map and are responsible for verifying its accuracy and approving/recording changes to the stored data. Ceph monitors use a Paxos consensus mechanism to maintain a quorum and verify the authenticity of the data stored in the cluster. We will revisit the importance of this consensus mechanism later when we discuss Solana integrations.
- Finally, Ceph is (hypothetically) infinitely scalable without any notable decreases in performance. There is no hypothetical max to how large a Ceph cluster can become. This is due to the different software daemons Ceph employs and how well the CRUSH algorithm scales. The largest Ceph cluster ever tested successfully stored 10,000,000,000 unique objects. If we think about each Solana block produced we are currently in the 120millions (as of the time of this writing). Therefore, Ceph is uniquely positioned to be the best possible solution for a blockchain that produces a new block every 400 milliseconds.
As a fun side note…
…the creator of the Paxos Consensus Mechansim, Leslie Lamport, is also honored as Solana’s biggest technical influences…
If nothing else…
it’s kinda cool to know that we’re using the same DB software as the CERN team is!
Of course, none of this is to suggest that Ceph is some kind of a perfect solution that has no flaws and can do no wrong. However, for our use case Ceph checks all boxes of performance, reliability, durability, scalability, and its functionality can be adapted to provide the decentralized trustless data storage that Solana needs.
Solana On-Chain Events as Proof of Storage
First, please allow me to set some context about Solana itself… At the most basic fundamental level, blockchains are nothing more than ledgers. Ledgers record the history of how things have changed over time and all of those changes combined give you the current “state”. Solana is all about it’s “accounts” and the “state” of those accounts.
What do I mean by “state”? Let’s use computers as an example… Everything in a computer is 0’s and 1’s. The bit stores just a 0 or 1: it’s the smallest building block of storage. If a bit is a 1 then it’s current state is 1. If a bit is 0 then it’s current state is 0. That bit is either a 0 or a 1, the history of that bit is not relevant to whether that bit’s state is currently a 0 or a 1.
The next thing we need to know is that of the unique things about Solana is the way it structures everything as an “account.” This is important terminology, so here’s the definition:
Thinking back to state, the history of the accounts on Solana is completely irrelevant to their current state. Why? Because all the events that took place in order to arrive at the current state were reviewed by the Proof of History consensus mechanism (i.e. the Solana Validator network) and deemed to be valid.
Tying everything together, what Solana is really built to achieve and maintain consensus on is the state of all accounts on the network. It is a state machine. In fact, “the world’s most performant global state machine” is exactly how Solana is described in its Twitter bio.
The current state of Solana is already hashed but then it is shredded and decentralized across all the validators of the network. Effectively what this means is that Solana can recreate the current state of every piece of data on the blockchain… every wallet address, every program that’s been deployed, the location of every single token… at a moment’s notice… without needing to know anything about the historical transactions that led up to the current state.
Now, just because historical transactions are irrelevant to the current “in the moment” state of all accounts on Solana …again, because every event leading up to the present had to be passed through the Solana validator network and therefore was part of the consensus that led to the current state… doesn’t mean that historical transactions aren’t important to the developers and users performing actions on Solana.
This is where we circle back to Shadow Drive. The Shadow Drive utilizes the world’s most performant state machine to ensure the validity and integrity of the state of its storage network via on-chain change events.
It’s easiest to explain in real terms, so here’s an example… Let’s say you upload NFT metadata to Shadow Drive and want that data to be stored permanently and want it to be immutable.
- An NFT project uploads their NFT metadata to Shadow Drive. The NFT project wishes this metadata to be stored forever and wants it to be immutable.
- These instructions from the NFT project creators are passed to the Shadow Drive smart contract and the smart contract sends a transaction request to the Solana validators which has been signed by the NFT project’s wallet.
- An account hash is created on-chain that indicates which NFT project the metadata is associated with, that this metadata is immutable, and to be stored permanently.
- Then the hash plus the actual NFT metadata itself are hashed again, sharded, and uploaded to Shadow Drive by the smart contract in the appropriate storage format and location to ensure that the data can never be edited (not even by the NFT founders who uploaded it) and never deleted.
Under this design, the Shadow Drive smart contract only recognizes signatures from the wallet associated with the stored data. In this example, any transaction requests to change the data listed as immutable in Shadow Drive would fail to be validated by the Solana validators because the smart contract would not pass through any instructions to edit or delete this data due to the fact that this data is immutable and therefore cannot be edited or deleted.
In the “Shadow Drive Smart Contract” section, we will go deeper into how the Shadow Drive smart contract interacts with the Solana validator network to ensure that the data stored in Shadow Drive and the Paxos consensus mechanism inside are also being validated to ensure their current state (and thus the integrity of the data) remains intact.
The Shadow Drive Smart Contract
The Shadow Drive smart contract(s) are designed to fit multiple use cases. The beauty of smart contracts is the flexibility of design, which is one of the great things about building a storage layer as opposed to building a whole new blockchain (again, why build a whole new blockchain when we already have the world’s most performant state machine at our disposal).
The Shadow Drive Smart Contract and Paxos
The Paxos consensus mechanism is pretty amazing and is, arguably, the forerunner to the Proof of History consensus mechanism. As written in the Solana docs, Leslie Lamport (the creator of Paxos) is credited as the greatest technical influence to Solana. Another great resource explaining Paxos can be found here: https://understandingpaxos.wordpress.com/
1,000,000,000 Shades = 1 SHDW
This is in line with 1b lamports equaling 1 SOL as a measure of the cost computational units
1,000,000,000 bytes = 1 gigabyte
1 byte stored = 1 Shade
Immutable storage is relatively simple… If you upload 1gb of data to Shadow Drive, then a cost of .25 SHDW is applied and the smart contract sends that SHDW to the Shadow Operator smart contract to be sent out to Shadow Operators as emissions for operating Shadow Nodes. The data is flagged as “immutable” by the Shadow Drive smart contract and is therefore unable to be edited or deleted.
Mutable storage is a little more complex (but not overly so) and relies on staking and rent mechanics very similar to the ones used by Solana in their on-chain accounts design. If you upload 1gb of data to Shadow Drive, then you are required to stake .25 SHDW. On a per epoch basis, rent is assessed against mutable storage at the rate of 1 SHDW per GB per year. Therefore, uploading 1gb of data to Shadow Drive would look something like this…
- A user uploads 1gb of data into Shadow Drive. In order to do so, the user is required to stake 1 SHDW with the Shadow Drive smart contract.
- On a per epoch basis, rent is assessed and removed from the staked SHDW (unless the user has flagged the data as immutable) in the form of Shades. By the end of a 1 year period, the storage user would need to add additional SHDW to their account or risk the data being deleted.
- The rent assessed each epoch is sent to the Shadow Operator Smart Contract and adds to the pool the smart contract pays out to Shadow Operators as emissions.
- If, after six months, the storage user unstakes their SHDW (signaling that they no longer need this data to be stored), then they would have paid .5 SHDW in rent and would receive .5 SHDW back.
- If, after six months, the storage user decided to flag their data as immutable, then they would need to add the difference between what is currently staked in the smart contract and what it would cost to immutably store their data. So, a user with 1gb of data stored and .5 SHDW staked in the smart contract (after six months of staking) would need to add .5 SHDW in order to make their data immutable.
The rent mechanic is designed to take into account the increased bandwidth usage that typically comes along with mutable storage as it has an increased amount of reads and writes relative to immutable storage.
The benefit to users of mutable storage is that their short-term storage needs are met; and once those needs have been met, the mutable storage user can unstake their SHDW and be returned whatever amount of SHDW they did not use. The benefit to those who use immutable storage is that they can rest assured that their account data will be retained for continual access and usage into perpetuity. This also means that as the price of $SHDW fluctuates, users of immutable storage will likely have paid a cheaper price for storage as years pass.
In the end, we believe immutable storage to be the best overall value, but we also recognize that short-term data storage needs should not be forced into a singular regime.
Please note, the team reserves the right to change any and all aspects of storage design and costs in order to ensure the long term viability of Shadow Drive and Shadow Operators. We would rather make tweaks to one protocol along the way instead of building a new protocol each time to address any unforeseen issues.
The Shadow Drive Enhanced GenesysGo Network and Upcoming Endeavors
Data and data storage is the ultimate foundation for many new and exciting directions for GenesysGo and our network. Our RPC network will be undergoing an extreme revamp with the creation of a new kind of node by which we serve Solana data.
As written earlier, Shadow Drive is capable of ingesting the most recent block, storing it, replicating it, and then serving RPC requests against it faster than the Solana validators can serve the next block. This new capacity allows us to maximize our relationship with Solana Labs by being able to answer RPC requests using machines which are not affected by turbulence of the Solana validator network. The current design of the RPC server is also deeply tied to the state of congestion on Solana at any given moment. What we have done with Shadow Nodes has allowed us to keep them as deeply tied to Solana as an RPC node, however we’ve attached the tie in a different way and place to allow Shadow Nodes to truly maximize stability and performance.
Additionally, the vast data set we are ingesting (in direct partnership with Solana) will allow us to serve network snapshots faster and more accurately than is currently possible… which will allow new validators to come online faster after they perform software rollups, restart their machines after maintenance, or recover after a hardware failure. This will also have the added benefit of saving existing validators from needing to use their own resources to serve snapshots to other validators. Anything that maximizes a validator’s ability to focus on building blocks and building blocks alone is a huge win for the network.
The Shadow Net — Solana’s Shadow Drive powered canary chain
The most exciting new endeavor that Shadow Drive will power is the creation of a Solana canary network with future multichain capability. We are working directly with the Solana Labs on the creation of the “Shadow Net”. The Shadow Net will create interoperability for developers across all chains to deploy and test their smart contract designs by spawning “Shadow Realms.” A Shadow Realm takes a snapshot of Solana’s current account state and provides builders with an exact replica of the Solana blockchain (powered by Shadow Nodes). The Shadow Net will be directly integrated with the Solana Wormhole and GenesysGo will be working closely with team responsible for maintaining the Wormhole to ensure the safety of integration. This direct Wormhole integration will allow for mainnet token assets to be moved in and out of Shadow Realms. Effectively, Shadow Drive will be powering the creation of private “instanced” versions of the Solana blockchain (eventually expanded to all chains) which can be shut down at any time and yet still fully integrate with Solana while they exist. The gas fee token of these Shadow Realms will, of course, be $SHDW. Our exciting new partnership with Solana will cement $SHDW not only as a utility token focused on storage… but, arguably, as a shadow L1 token tied directly to Solana. More will be written in the coming days regarding this use case. Additionally, in the coming week a “Wait List” will be published for developers who have expressed interest in this to Toly and the Solana Labs team.
The GenesysGo team and I are very excited about the work that’s been done over the past four months. We have faced some unexpected hurdles along the way which have delayed our original timeframe but the team did what it always does and dug in to overcome. We spent time learning new things and expanding our skillsets in order to ensure that we could deliver on our promises to the ecosystem. Our position continues to be that focused BUIDL and grinding can break through any barriers and, so far, we have yet to find a barrier that didn’t yield.
We hope that this more in-depth understanding of Shadow Drive leaves all of you as excited for the future as we are. I’m sure there will be several things which come up post-release that we will need to address and I have no doubt in my mind that we will address them appropriately. This confidence is due to the team’s unwavering focus on the ecosystem and the community that comprises it. We will continue looking for ways to support the Solana ecosystem of BUIDLers in order to help support the amazing innovation Solana has seen so far.
Please know that everyone in the Solana ecosystem from Toly to the person who just setup their first Solana wallet have the heartfelt appreciation and thanks of the GenesysGo team. What our project has turned into has grown beyond anything we thought possible when we set out on this journey. We have learned much along the way and look forward to the continued learnings ahead. It’s still so early fam… #wagmi