Merkle

Systems in Rust

Announcements

  • Trees, ish
    • Heaps
    • Merkle Trees

Today

  • Trees, ish
    • Heaps
    • Merkle Trees

Trees

\(n\)-ary Trees

  • A tree is a type of graph.
  • A graph is a pair of
    • A set of “nodes” or “vertices” (or “vertexes”)
    • A set of “edges” which are pairs of nodes
      • They may be ordered pairs or not.

mocha 2-tree

  • A decision tree is a 2-ary tree.

finite_automata 0 hot? 11 small? 0->11 y 12 small? 0->12 n 21 togo? 11->21 y 22 togo? 11->22 n 23 togo? 12->23 y 24 togo? 12->24 n

Common Case

  • Usually in computing, I use trees to store “stuff”
    • In a SQL database, I used a b-tree to store… data, I guess
    • In a binary search tree, I sort elements of sets with total orders, like integers or words
    • In OS page table, I store memory locations.

Our Case

  • Motivated by source control.
  • We will store differences between versions.
  • We will hash these differences to ensure the integrity of the code base.

LCS Review

  • Recall:
    • We can see how close to strings are too each other.
    • Many changes to code bases are small changes to one character or line.
    • Think of these differences as the data in the leaves of a tree.

How do ensure?

  • We want to have confidence in these changes.
  • Easy - hash them, and save the hashes.
  • But wait - that’s a lot of hashes to hold onto…
  • And it is unreasonable to store potentially thousands of hashes for tiny changes!

Hash tree

finite_automata 0 Merkle Root 11 SHA digest 0->11 12 SHA digest 0->12 21 SHA digest 11->21 22 SHA digest 11->22 23 SHA digest 12->23 24 SHA digest 12->24 31 commit_0 21->31 32 commit_1 22->32 33 commit_2 23->33 34 commit_3 24->34

In practice

  1. Place on commit in linear data structure, like a stack or array list or Vec
  2. Hash each commit, store the results (!!! non-trivial)
  3. For each pair of hashes, compute the hash of the concatenation of the two hashes and store the results
  4. Repeat until there is a single hash, the “Merkle root”

Tree Terms

  • Complete

  • Definition: A tree in which every level, except possibly the deepest, is entirely filled. At depth n, the height of the tree, all nodes are as far left as possible.

Tree Terms

  • Perfect

  • Definition: A k-ary tree with all leaf nodes at same depth. All internal nodes have degree k.

  • Note: Authors differ in their definitions. This is called “complete” by [CLR90, page 95].

  • I take CLR as ground truth (more often 2nd+ edition as “CLRS”)

Complete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8

Complete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6

Complete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8 1 1 2->1

Incomplete

finite_automata 5 5 3 3 5->3 7 7 5->7 2 2 3->2 6 6 7->6 8 8 7->8

Takeaway

  • In a complete tree, we can always place all leafves in a linear data structure, like an array.
  • In a complete binary (2-ary) tree, there will be (roughly) the same number of non-leaf and leaf nodes

Naive

  • We may be tempted to do this
struct HashTree {
    data: String; // Is it a string?
    left: Option<Box<HashTree>>;
    rite: Option<Box<HashTree>>;
}
  • This could in fact work, mostly, but leads to some problems.
    • How do you tell if a pointer points to a leaf or not?

Today

  • ✓ Trees, ish
    • Heaps
    • Merkle Trees

Heaps

Heaps

  • To motivate our technique for Merkle trees, we are going to make a digression to the heap abstract data type, also called a priority queue.
  • Vs. other graph representations, heaps can be implemented by store nodes only and deriving edges from a e.g. node’s index in an array.

Why “Priority”

  • A heap is a nonlinear (but linearizable) data structure which sorts* data… kinda.
  • A heap always has some minimal or maximal value, within the the collection, in its initial position, or root.

Empty Heap

  • Heaps are usually based on binary trees so we’ll make an example.
  • Start with an empty heap it looks like this:
  • There’s nothing here

Initial Value

  • Insert 128 as an initial value
  • 128 is the maximium, and also the root/initial value

finite_automata 128 128

Insert 64

  • To Insert a value, we Insert an unpopulated node that would make a complete tree of appropriate size.
  • In this case, in a binary tree, the first “child” of 128

finite_automata empty 128 128 128->empty

Insert 64

  • Place 64 here
  • 64 is \(\leq\) 128 so this is “A-OK”
    • “A-OK” is short for “Ada, Oklahoma” in honor of Ada Lovelace, the first computer scientist
    • (This isn’t true)

finite_automata 128 128 064 064 128->064

Insert 192

  • To Insert a value, we Insert an unpopulated node that would make a complete tree of appropriate size.
  • In this case, in a binary tree, the second “child” of 128

finite_automata 128 128 064 064 128->064 empty 128->empty

Insert 192

  • Place 192 here.
  • Big “uh oh”

finite_automata 128 128 064 064 128->064 192 192 128->192

Heapify

  • We do an operation often called “heapify”
  • In a loop
    • Swap new value with its parent
    • Terminate when (1) parent is greater or (2) no parent.
  • So swap 192 and 128

finite_automata 192 192 064 064 192->064 128 128 192->128

Insert 96

  • To Insert a value, we Insert an unpopulated node that would make a complete tree of appropriate size.
  • In this case, in a binary tree, the first “child” of 064

finite_automata 192 192 064 064 192->064 128 128 192->128 empty 064->empty

Insert 96

  • Place 96 here.

finite_automata 192 192 064 064 192->064 128 128 192->128 096 096 064->096

Heapify

  • 96 \(\gt\) 64 so swap
  • 96 \(\leq\) 192 so terminate

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064

Aside: Structs

  • It is possible to implement a heap like so:
struct Heap {
    data: String; // Is it a string?
    left: Option<Box<Heap>>;
    rite: Option<Box<Heap>>;
}
  • In this case, we have to store around one box for every data entry!
  • This is very vastful!

Aside: Arrays

  • Instead, realize we always insert new elements to the… next available position.
  • This is a stack push, usually implemented on top of a list or array or Vec.
typedef heap = Vec<String>; // Are we for sure that's a string?
  • Overhead of zero. 0. \(0\).

Use a vector

  • In theory

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064

  • In practice

finite_automata heap 192 096 128 064

Insert 160

  • In theory

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064 empty 096->empty

  • In practice

finite_automata heap 192 096 128 064

Insert 160

  • In theory

finite_automata 192 192 096 096 192->096 128 128 192->128 064 064 096->064 160 160 096->160

  • In practice

finite_automata heap 192 096 128 064 160

Heapify

  • In theory

finite_automata 160 160 064 064 160->064 096 096 160->096 192 192 192->160 128 128 192->128

  • In practice

finite_automata heap 192 160 128 064 096 heap:old->heap:new

On Indices

  • Consider what happens here with respect to indices
  • 160 is Inserted as the element at zero-index location 4

finite_automata heap 0 192 1 096 2 128 3 064 4 160

On Indices

  • After swap, 160 is at swaps at zero-index location 1
    • No particularly clear pattern here, to me

finite_automata heap 0 192 1 160 2 128 3 064 4 096

1-Index

  • 3 also would’ve swapped to 1
    • Floor/int div by 2
    • Wait, 4 -> 1 is an off-by-one int div
  • Index by one (1), not zero (0).

finite_automata heap 1 192 2 160 3 128 4 064 5 096

Insert 224

  1. Next index is 6.

finite_automata heap 1 192 2 160 3 128 4 064 5 096 6

Insert 224

  1. Next index is 6
  2. Insert 224 there

finite_automata heap 1 192 2 160 3 128 4 064 5 096 6 224

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2

finite_automata heap 1 192 2 160 3 128 4 064 5 096 6 224 heap:six->heap:three

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap

finite_automata heap 1 192 2 160 3 224 4 064 5 096 6 128 heap:six->heap:three

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap
  5. Check 1 = 3/2

finite_automata heap 1 192 2 160 3 224 4 064 5 096 6 128 heap:three->heap:one

Heapify

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap
  5. Check 1 = 3/2

finite_automata heap 1 192 2 160 3 224 4 064 5 096 6 128 heap:three->heap:one

Heapified

  1. Next index is 6
  2. Insert 224 there
  3. Check 3 = 6/2
  4. Swap
  5. Check 1 = 3/2
  6. Swap

finite_automata heap 1 224 2 160 3 192 4 064 5 096 6 128

Heapified

finite_automata 1 1 224 2 2 160 1->2 3 3 192 1->3 4 4 064 2->4 5 5 096 2->5 6 6 128 3->6

finite_automata heap 1 224 2 160 3 192 4 064 5 096 6 128

Today

  • ✓ Trees, ish
    • ✓ Heaps
    • Merkle Trees

Merkle Trees

Merkle

  • To create a Merkle tree, or hash tree, all non-leaf nodes are hashes and all leaf nodes are what the data type is.
  • Store the data type in array of length \(n\)
  • The tree requires \(n-1\) non-leaf nodes

Example

  • Suppose these are your commits
    • Stored in an array (or a Vec!)

finite_automata heap id user before after A cd x+=1 x=x+1 B profc if True if 1 C calvin i = 0 val = 0 D cd x=x+1 i=i+1 E profc if 1 if x F calvin val = 0 i=0

Example

  • In a complete binary tree, the last row is a power of 2 in size.
    • We have 6 entries, so we will need part of a row of size 8 and part of a row of size 4

finite_automata heap id A B C D E F

Example

  • Fill out the tree to get up to the 4-row.

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7

finite_automata heap id A B C D E F

Example

  • Add additional leaves until there’s enough…

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7 8 4->8 9 4->9

finite_automata heap id A B C D E F

Example

  • More…

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7 8 4->8 9 4->9 A 5->A B 5->B

finite_automata heap id A B C D E F

Example

  1. Relate commits to leaves.

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 3->6 7 3->7 8 4->8 9 4->9 A 5->A B 5->B heap A B C D E F 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

Hash Commits

finite_automata 1 2 1->2 3 1->3 4 2->4 5 2->5 6 h(E) 3->6 7 h(F) 3->7 8 h(A) 4->8 9 h(B) 4->9 A h(C) 5->A B h(D) 5->B heap A B C D E F 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

Hash Hashes

finite_automata 1 0x1 h(*0x2,*0x3) 2 0x2 h(*0x4,*0x5) 1->2 3 0x3 h(*0x6,*0x7) 1->3 4 0x4 h(*0xA,*0xB) 2->4 5 0x5 h(*0x8,*0x9) 2->5 6 0x6 h(E) 3->6 7 0x7 h(F) 3->7 8 0x8 h(A) 4->8 9 0x9 h(B) 4->9 A 0xA h(C) 5->A B 0xB h(D) 5->B heap A B C D E F 6->heap:e 7->heap:f 8->heap:a 9->heap:b A->heap:c B->heap:d

Merkle Root

  • The root of the tree contains a hash known as the “Merkle root”.
  • It is sufficient to check only this hash to know you have an untampered repository.

Today

  • ✓ Trees, ish
    • ✓ Heaps
    • ✓ Merkle Trees