One of the problems with Ethereum, or any blockchain, is that it grows in size over time. This means increased code complexity and storage requirements.
The blockchain must maintain a history-wide history of all data that every client must store and that new clients must download. This continually increases client load and synchronization times.
Moreover, Vitalik Buterin writes on his blog that code complexity increases over time because “it is easier to add new features than to remove old ones.”
Therefore, Buterin believes that developers must actively work to stem this growing trend while maintaining the permanence of Ethereum. Therefore, Buterin presented The Purge, a three-part plan aimed at simplifying blockchain and reducing data load.
Part 1: Record Expiration
A fully synchronized Ethereum node currently requires approximately 1.1TB of storage space for running clients. Consensus clients require several hundred more gigabytes. According to Buterin, much of this data is historical, such as data on past blocks, transactions, and receipts, many of which are several years old. The disk space required to store all these records continues to grow by hundreds of gigabytes every year.
Buterin believes that this problem can be solved with a method called History Expiry.
Each block in the blockchain points to the previous block through a hash link. This means that agreement on the current block implies agreement on the past.
According to Buterin, as long as the network has consensus on the current block, all relevant historical data can be provided by a single actor through a Merkle proof, which allows anyone to verify its integrity. This means that instead of every node storing all the data, each node can store a small percentage of the data, reducing storage requirements.
Buterin proposes adopting the operating model of a torrent network, where each participant essentially stores and distributes only a small portion of the data stored and distributed on the network.
Ethereum has already taken steps to reduce storage requirements. Certain information now has an expiration date. For example, consensus blocks are stored for 6 months and blobs are stored for 18 days.
EIP-4444 is another step in that direction. We aim to limit the storage period of historical blocks and receipts to one year. However, the long-term goal is to have a fixed period of time, such as 18 days. During this period, all nodes are required to store everything, and older data is stored in a decentralized manner on a peer-to-peer network.
Part 2: Status Expiration
According to Buterin, eliminating the need for clients to store entire records would not completely solve the problem of increasing storage requirements. This is because “continuous growth in the country, including account balances and nonce values, contract codes and contract storage,” requires clients to increase their storage capacity by approximately 50 GB each year.
New state objects can be created in three ways: All you have to do is create a new account, transfer ETH to the new account, and set up your previously dormant storage slot. Once a state object is created, it remains in that state forever.
Buterin believes that a solution to automatically expire state objects over time should be efficient, user-friendly, and developer-friendly. This means that the solution should not require large amounts of computation, users should not lose access to their tokens if they leave them for years, and developers should not be significantly inconvenienced in the process.
Buterin proposes two types of “least bad known solutions.”
- Partial state expiration solution
- Suggestions for state expiration based on address duration.
Partial state expiration
The partial state expiration proposal works based on the principle of dividing the state into “chunks”. This requires everyone to store a “top map” where chunks are either empty or never empty. Data within a chunk is stored only if it has been recently accessed. The “resurrection” mechanism allows anyone to get the data back in chunks if it has not been stored, providing evidence of what the data is.
Address duration based status expiration
Address period-based state expiration proposes having an increasing list of state trees instead of storing the entire state in one state. All states read or written are updated to the latest state tree. A new empty state tree is added once per period (which can be a year).
In this scenario, the previous state tree is frozen and the entire node only needs to store the two most recent trees. Once a state object becomes part of an expired tree, it can be read or written to, but transactions require Merkle proofs. At the end of the transaction, it is added back to the latest tree.
Function Summary
Over time, all protocols, no matter how simple they start out, become complex.
Buterin wrote:
“If we don’t want Ethereum to fall into an increasingly complex black hole, we need to do one of two things: (i) stop making changes; solidify the protocol(ii) you can actually do it remove Features and reduce complexity.”
According to Buterin, cleaning up Ethereum’s complexity requires several small fixes, such as removing the SELFDESTRUCT opcode, removing outdated transaction types and beacon chain committees, and LOG reform. Buterin also proposed simplifying gas dynamics, eliminating gas observability, and improving static analysis.