Advanced
Concurrent Merkle Trees
Introduction
A Merkle Tree is a tree data structure in which each leaf node is labeled with a hash representing some data. Adjacent leaves are hashed together, and the resulting hash becomes the label for the node that is the parent of those leaves. Nodes at the same level are hashed together again, and the resulting hash becomes the label for the node that is the parent of those nodes. This process continues until a single hash is created for the root node. This single hash cryptographically represents the data integrity of the entire tree, and is called the Merkle root.
Most Merkle trees are binary trees, but they do not have to be. The Merkle tree used for Bubblegum compressed NFTs (cNFTs) is a binary tree as shown in our diagram.
When we talk about storing the state of data on the blockchain, if we store this Merkle root, we can effectively store a single value that represents the data integrity of everything that was previously hashed in order to create the root. If any leaf value were to change on the tree, the existing Merkle root would become invalid and need to be recomputed.
For Bubblegum compressed NFTs, the leaf node hashes are the hash of a leaf schema. The leaf schema contains a leaf ID, owner/delegate information, a creator_hash
representing the cNFT's creators, and a data_hash
representing the compressed NFT's metadata in general (it again includes the creator array). So all the information we need to cryptographically verify a single compressed NFT is stored in the hashed leaf schema.
Leaf Path
As we learned in the previous section, in a Merkle tree only the leaf nodes represent end-user data. The inner nodes leading up to the hash are all just intermediate values in service to the Merkle root. When we refer to a leaf node's Path, we mean the leaf node hash itself and the inner nodes directly leading to the Merkle root. For example, the Path for leaf 2 is highlighted in the diagram below.
Leaf Proof
If we want to prove whether a compressed NFT exists in a Merkle tree, we don't need to rehash all the leaf nodes. As you can see in the diagram below we only need to have certain values to hash together until we calculate our Merkle root. These values are known as the Proof for the leaf. Specifically, the Proof for a leaf node is the adjacent leaf node's hash, and the adjacent inner node hashes that can be used to calculate the Merkle root. The Proof for leaf 2 is highlighted in the diagram below.
Leaf Validation
The process for using the leaf node and its Proof to calculate the Merkle root is as follows:
- Start with our raw leaf schema, hash it.
- Hash the value from step 1 with the sibling leaf node's hash to create the next value up of the leaf's Path.
- Hash the path value from step 2 with the next sibling inner node, which is the next value of the Proof.
- Continue this process of hashing values with sibling inner node values, up the tree until we calculate the Merkle root.
If the Merkle root we calculate matches the Merkle root we were given for that tree, then we know that our exact leaf node exists in the Merkle tree. Also any time a leaf node is updated (i.e. when the cNFT is transferred to a new owner), a new Merkle root must be calculated and updated onchain.
Concurrency
The onchain Merkle tree used for cNFTs must be able to handle multiple writes occurring in the same block. This is because there can be multiple transactions to mint new cNFTs to the tree, transfer cNFTs, delegate cNFTs, burn cNFTs, etc. The problem is that the first write to the onchain tree invalidates the proofs sent for other writes within the same block.
The solution for this is that the Merkle tree used by spl-account-compression doesn't only store one Merkle root, but also stores a ChangeLog
of previous roots and the paths for previously modified leaves. Even if the root and proof sent by the new transaction have been invalidated by a previous update, the program will fast-forward the proof. Note the number of ChangeLog
s available is set by the Max Buffer Size used when creating the tree.
Also note that the rightmost proof for the Merkle tree is stored onchain. This allows for appends to the tree to occur without needing a proof to be sent. This is exactly how Bubblegum is able to mint new cNFTs without needing a proof.