BNB Chain has released a new multi-data storage solution for BNB Smart Chain (BSC) Geth nodes, which aims to address performance inefficiencies caused by rapid growth in data volumes. According to the BNB Chain Blog, this new approach addresses issues related to mixed data storage patterns, reduced query efficiency, and optimization conflicts within a single key-value database.
Current task
Currently, BSC node data is stored in a single key-value database instance, categorized by different prefixes. This setup has caused several problems.
- Storing mixed data with different patterns is inefficient.
- As database size grows, query efficiency deteriorates, especially during the execution process.
- Because read and write optimizations often conflict, the ability to optimize database parameters for different data patterns is limited.
The existing storage pattern includes a single KV store and two ancient stores, which handle different types of data access patterns.
Proposed solution
Multi-database approach
The new solution involves separating blockchain data into three separate databases: a block database, a trie database, and a snapshot database, each designed with specific data schemas and access behaviors.
- Block Database: Stores block-related data such as header, body, receipt, difficulty, and past block data.
- Trie Database: Contains all tree nodes in their current state and past state data for approximately 90,000 blocks.
- Snapshot database: House snapshot data, transaction index, contract code and other metadata. This database is read-intensive and frequently accessed during block execution.
Folder structure
The new folder structure includes the original database within the chaindata/ folder, and introduces new block/ and state/ folders to store block and trie data respectively. Each directory also includes an ancient folder to store historical data.
Impact and Achievements
The multi-database approach is expected to improve the performance, scalability, and maintainability of BSC nodes. By separating databases based on data schema and access behavior, the solution aims to reduce read/write latency and improve overall blockchain performance.
Block Database
The block database stores recent blocks in a key-value database and then migrates them to an older database to reduce disk bandwidth usage. The BNB chain plans to keep only 20-30 recent blocks in the key-value database, as opposed to the previous 90,000 blocks, to match the proof-of-stake consensus mechanism.
Try Database
Trie Database handles the fast-growing Trie nodes of Merkle Patricia Tries (MPT). This separation reduces the cost of database compression and improves read/write speeds, improving block execution and verification performance.
Snapshot database
BNB Chain aims to improve read/write performance by isolating snapshot data in its own database and reducing the depth of the Log-Structured Merge (LSM) tree. Frequent access to snapshot data during blockchain execution can benefit from this reduced latency.
Test Results
Our tests on an EC2 m6i.4xlarge machine with Geth v1.3.10 installed showed significant performance improvements. The multi-database setup outperformed the single-database model, especially when the databases were spread across multiple disks.
ETH Adoption
This multi-database solution is also being contributed to the Ethereum Geth client. Discussions with Geth developers are ongoing, and the feature is expected to become part of the Ethereum Geth client once the pull request is merged.
expect
As blockchain data continues to grow, BNB Chain emphasizes the importance of building an efficient storage model for various data types. Multi-database support helps store state data independently and paves the way for a high-performance state data engine. This initiative aims to make the BSC network more robust and efficient.
Image source: Shutterstock