Graphs
Systems in Rust
Announcements
- Welcome to Systems in Rust
- Action Items:
- Keep at
traits - Graph theory.
- Keep at
Today
- Graphs
- DAGs
- Trees
Graphs
\(n\)-ary Trees
- A tree is a type of graph.
- A graph is a pair of
- A set of “nodes” or “vertices” (or “vertexes”)
- A set of “edges” which are pairs of nodes
- They may be ordered pairs or not.
Graphs
- There are many ways to understand graphs, but I actually think graph theory is quite accessible.
- I think it’s also easier when using a running example.
- I will also use a running example, Amtrak 🚄
- Or perhaps… a training example?
- I am running… a train??????
- I am training… to run?????
Amtrak 🚄
Graph: Definition
- A graph is an ordered pair.
That is: two things, in a fixed order.
Can be thought of as a sequence of length 2
Not Sets
- A graph is not a set:
Aside: Pairs
- Take a number plane (click):
y=2
x=-2 x=2
y=2
Aside: Pairs
- Denote ordered pair (.5,1.5) in red.
y=2
x=-2 x=2
y=2
Pair: Definition
- We construct pairs from sets.
- An ordered pair is a set of cardinality two. (2 elements)
- One element of the set is a set of cardinality one. (1 element)
- The other element of the set is a set of cardinality two (2 elements)
- The element of the set of cardinality one is one of the two elements of the set of cardinality two.
Pair: Definition
- The element of both sets is regarded as the first element.
- Python fails - Python sets aren’t formal sets:
>>> [a,b] ## ordered pair
['one thing', 'another thing']
>>> {{a},{a,b}} ## ordered pair with sets
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>>- But this is an implementation hurdle, and not a logical one.
Pair: Definition
An ordered pair is a set of two elements, a set containing the head element of the pair, and a set containing both elements of the pair.
“Head”
- Extract the head item by taking the intersection of all of the elements of p
- I denote this ‘Head(p)’
\[ \begin{align*} \text{Head}(p) &:= x : x \in \bigcap p \\ \text{Head}(p) &:= x : \forall Y \in p : x \in Y \end{align*} \]
“Tail”
- The tail is the sole element of the union that is not an element of the intersection, with the caveat that we do not consider the case in which the head and tail differ
- I denote this ‘Tail(p)’
\[ \begin{align*} \text{Tail}(p) &:= x : x \in \bigcup p \setminus \bigcap p \\ \text{Tail}(p) &:= x : \exists Y_1, Y_2 \in p : x \in Y_1 \land x \notin Y_2 \end{align*} \]
Graph: Definition
- A graph is an ordered pair \(G\)
- We denote the two elements of the pair as:
- \(V\) for vertices
- \(E\) for edges \[ G = (V, E) := \{ \{V\}, \{V,E\}\} \]
In Amtrak
- \(V\), the vertices or nodes, stations or stops or cities,
- like Portland and Seattle, or
- like Kings St Station and Central Station
- \(E\), the edges, are the connections to adjacent stations
- like Portland and Oregon City, or Portland and Vancouver, WA.
Graph: Definition
- An
Amtrakis an ordered pair - We denote the two elements of the pair as:
- Stations for train stations used by Amtrak passengers, and
- Trains which Amtrak passengers ride between stations.
Amtrak = (Stations, Trains)
Nodes, Vertices, or \(V\)
- Let us consider the “Amtrak Cascades”
- The set of vertices is the set of stations:
| ALBANY | EVERETT | PORTLAND | TUKWILA | EUGENE |
| BELLINGHAM | KELSO/LONGVIEW | SALEM | VANCOUVER BC | OREGON CITY |
| CENTRALIA | MOUNT VERNON | SEATTLE | VANCOUVER WA | TACOMA |
| EDMONDS | OLYMPIA/LACEY | STANWOOD |
Nodes, Vertices, or \(V\)
- Restrict our example to the PDX<->SEA 6x daily trains:
| CENTRALIA | OLYMPIA/LACEY | SEATTLE | TUKWILA |
| KELSO/LONGVIEW | PORTLAND | TACOMA | VANCOUVER WA |
Edges or \(E\)
- \(E\) is a set of elements termed ‘edges’, ‘links’, or ‘lines’
- The edges are pairs of vertices
- In an undirected graph, edges are unordered pairs (sets of cardinality two)
- In an directed graph, edges are ordered pairs (not subsets then)
- Amtrak is undirected, trains are directed.
Directionality
- The internet is directed.
- We may use ‘curl’ to download files from a url, but going the other way (creating files at a url) is highly nontrivial.
Boxis directed.- Recall,
*brefers to the value inb = Box::new(x). **bdoes not refer to tob, rather, we would useBox::new()to go in the reverse direction.
- Recall,
Route
Order north-to-south:
Centralia and Keslo/Longview is an edge.
The others, like Seattle and Tacoma, are not edges.
Edges or \(E\)
- Our edges are pairs of *adjacent- stations
- There are 8 stations so there are 7 edges
| (SEATTLE, TUKWILA) | (CENTRALIA, KELSO) |
| (TUKWILA, TACOMA) | (KELSO, VANCOUVER) |
| (TACOMA, OLYMPIA) | (VANCOUVER, PORTLAND) |
| (OLYMPIA, CENTRALIA) |
Notation
- I use tuple notion here, with parens, but it would be equally proper to use set notatation for an undirected graph.
- Also I maintained geographic ordering (as a convenience) but it would be more proper to have no particular order since E is a set.
Graph: Definition
- We can express G using only sets over elements of stations (this is basically JSON):
{
"G": {
"V": [
"SEATTLE",
"TUKWILA",
"TACOMA",
"OLYMPIA/LACEY",
"CENTRALIA",
"KELSO/LONGVIEW",
"VANCOUVER WA",
"PORTLAND"
],
"E": [
[
"TUKWILA",
"TACOMA"
],
[
"TACOMA",
"OLYMPIA/LACEY"
],
[
"OLYMPIA/LACEY",
"CENTRALIA"
],
[
"CENTRALIA",
"KELSO/LONGVIEW"
],
[
"KELSO/LONGVIEW",
"VANCOUVER WA"
],
[
"VANCOUVER WA",
"PORTLAND"
]
]
}
}https://askubuntu.com/a/1473153Graph: Exercise
Amtrak recently added lines, including the Hiawatha, with service from Milwaukee to the world’s greatest city, Chicago.
From the grandeur of Grant Park’s Buckingham Fountain to iconic museums and skyscrapers, see for yourself why Chicago was once dubbed “Paris on the Prairie.” Engage in retail therapy on the Magnificent Mile or root for the home team within the friendly confines of famed Wrigley Field.
Graph: Exercise
Today
- ✓ Graphs
- DAGs
- Trees
DAGs
Definition
Cycles
- \(E\) is a set of pairs of elements of \(V\).
- We can think of edges in a graph as a homogenous binary relation.
- Homogenous: From a set (the set of vertices) to itself.
- Binary: Over two elements (just like edges)
- Relation: A set (of pairs)
Closures
- “Transitive closures” are defined over homogenous binary relations, so we can take a “transitive closure” over edges.
Cartesian product
- A relation \(R\) over a set \(S\) is a set of pairs of elements of \(S\)
- Or: a subset of the set of all pairs of elements
Use \(\times\)
We denote all pairs using a “Cartesian product”:
\(S \times S\) in LaTeX
- S × S in HTML
Colors
- For example, over the set of traffic light colors \(C\): \[ C = \{ G, Y, R \} \]
- The Cartesian product is all pairs:
\[ C \times C = \{ (G, Y), (G, R), (Y, R), (Y, G),(R, G), (R, Y) \} \]
- Order matters.
Relation
- A relation \(R\) over a set \(S\) is a set of pairs of elements of \(S\)
- Or: a subset of the set of all pairs of elements in is transitive
- We begin with the Cartesian product:
\[ C \times C = \{ (G, Y), (G, R), (Y, R), (Y, G),(R, G), (R, Y) \} \]
- Define traffic light \(L \subset C \times C\)
Properties
- Construct \(L\), the traffic light relation
- No red without prior yellow \[ (Y,x) \in L \implies x = R \]
- All colors must appear in both positions \[ \forall x \in C, \exists y, z \in C : \{ (x,y), (z,x) \} \subset L \]
This gives
- The relation then is \[ L = \{ (G, Y), (Y, R), (R, G)\} \]
- We not this is a subset of \(C \times C\)
Notation
- Variety of ways to express this.
- Function application
L(G) = Y
- Infix notation
gLy
- Function application
Matrix
- Starting state on top, ending state on side
| \(G\) | \(Y\) | \(R\) | |
|---|---|---|---|
| \(G\) | x | ||
| \(Y\) | x | ||
| \(R\) | x |
- Astute blah blah its 9 bits
Rainbow
- Vs. traffic light, we can take “wavelength” where colors are sorted by something physics something optics help I don’t know this stuff.
Rainbow
- We note red is higher than yellow which is higher than green.
- Green is higher than nothing or \(\varnothing\)
| \(G\) | \(Y\) | \(R\) | |
|---|---|---|---|
| \(G\) | x | ||
| \(Y\) | x | ||
| \(R\) |
Rainbow
We note that Rainbow \(B\) (\(R\) is taken) is also a binary relation.
We will explore its properties. \[ B \subset C \times C = \{ (R,Y), (Y,G) \} \]
We additionally note that while \(R\) is immediately above \(Y\), it is *also- above \(G\)
Transitivitity
- Transitivity is over relation \(R\) as follows.
\[ ((a, b) \in R \land (b, c) \in R) \implies (a, c) \in R \]
- For us, that would be, \[ ((R, Y) \in B \land (Y,G) \in B) \implies (R, G) \in B \]
- In the rainbow case, this is true, \(R\) appears above \(G\) in the rainbow because \(R\) is above \(Y\) and \(Y\) is above \(G\).
Closure
- Our initial relation was not transitive.
| \(G\) | \(Y\) | \(R\) | |
|---|---|---|---|
| \(G\) | x | ||
| \(Y\) | x | ||
| \(R\) |
- As in the Amtrak case, we considered only adjacent elements as a convenience, but that doesn’t tell us if e.g. we can get from Seattle to Eugene.
Closure
- To do that, we need to to preserve transitivity - and we do so using the notion of transitive closure
- A closure is an operation over sets, and in the case of binary relations, which are sets of pairs, is an operation over the relation itself, or the sets of pairs themselves (which are equivalent).
Closure
- We denote the transitive closure of some relation \(R\) as \(R^+\)
- First, the transitive closure \(R^+\) of some relation \(R\) must contain all elements of \(R\).
\[ R^+ \supseteq R \]
- We recall we can use set operations like superset on \(R\) as it is a set of pairs, in this case ordered.
R⁺- ⊃ R*
Superset review
- That means that for all pairs (of colors, for example) in the binary relation \(R\), each one of them is also in \(R^+\)
\[ \forall p \in R, p \in R^+ \]
\[ \forall x,y \in S, (x,y) \in R \subset S \times S \implies (x,y) \in R^+ \]
Closure
- The transitive closure \(R^+\) of some relation \(R\) must be transitive.
- Recall:
\[ ((a, b) \in R \land (b, c) \in R) \implies (a, c) \in R \]
Discussion
- Note: since this is a “for all”, it means as we are required to add new elements to maintain transitivity, transitivity must apply to those new elements.
- So, if we add SeattleAmtrakTacoma because we have SeattleAmtrakTukwila and TukwilaAmtrakTacoma…
- We will then have to add SeattleAmtrakOlympia because Amtrak also contains TukwilaAmtrakOlympia
Color Example
- If in rainbow relation \(B\) we find \(R\) is higher in the rainbow than \(Y\) and \(Y\) is higher in the rainbow than \(G\) then \[ \{ (R,Y), (Y,G) \} \subseteq B \]
- The transitive closure adds at least one additional element. \[ \{ (R,Y), (Y,G), (R,G) \} \subseteq B^+ \]
Closure
- \(R^+\) must be the smallest possible relation for that is a transitive superset of \(R\).
- Cartesian Product is always transitive and a superset, but not always the smallest satisfying set.
A note on graphs
- By the way those binary relations are graph edges between graph nodes (the elements of the original set).
- And by the way those edges are pointers.
- Okay keep going.
Cycles
- We use transitive closure to define cycles
- We consider a graph \(G\)
\[ G = (V, E) := \{ \{V\}, \{V,E\}\} \]
- We note that \(E\) is a homogenous binary relation over \(V\)
- Take \(E^+\) the transitive closure of \(E\)
Cycles
- \(G\) contains a cycle (has the property of being cyclic) if \(E^+\) contains an edge from a node to itself.
\(\text{Cyclic}(G = (V, E)) := \exists v \in V: \{v, v\} \in E^+\)
Note
- Cycles are defined to be non-trivial, which means they don’t contain “loops”, so an edge from a node to itself.
- We don’t think about this with Amtrak because we don’t take a train from Portland to Portland
Other Terms
- Circuits, walks, paths, and trails are also defined in graph theory and are related.
- Direct paths and undirected paths are, of course, distinct but intuitively so.
- Cascades 503, a train which runs Seattle to Portland, has a Seattle to Tukwila edge but not a Tukwila to Seattle edge.
- The Cascades route, which runs service between Vancouver, BC and Eugene, has both edges.
DAGs
DAGs
- Cascades 503 is a directed acyclic graph.
- It has edges from all relative northern stations to all relative southern stations for all stations between Seattle and Portland inclusive.
- Amtrak Cascades is not a DAG
- It is not directed - SLM->PDX or PDX->SLM
- It contains cycles - PDX->SLM->PDX
DAGs
- Color wavelength is a directed acyclic graph.
DAGs
- A traffic light is a directed graph but not an acyclic graph.
DAGs
- Friendship, via mutual enthusiastic consent, is a neither directed nor, necessarily, acyclic.
- Facebook is not a DAG.
BS CS DAG
Today
- ✓ Graphs
- ✓ DAGs
- Trees
Trees
Definition
- A tree is DAG where every node has at most one incoming edge.
- \(\exists\) undirected trees but we do not consider them.
- A graph \(G = (V, E)\) is a tree if \[ \forall v \in V : |\{(v',v) \in E\}| \leq 1 \]
- For all vertexes, the cardinality of the set of incoming edges is less than or equal to one.
Decision Trees
Decision Trees
River Systems
Data Structure
Why trees?
- If trying to store data, really only need one unique path to the data.
- If trying to assure integrity of data with hashing, really only need to hash once.
By The Way
- It would have been fairly straightforward to implement stacks and queues using trees.
- Just why bother…
- Minimally, can use just the right or left side and you have a linked list.
BST
- In 100-level CS classes, students often implement a binary search tree.
- These trees store lower values on lesser/left side.
- It basically implements quicksort via structures.
Merkle
- In this class, we will do instead a Merkle tree or hash tree.
- Merkle trees are unsorted but they are verified.
- Building toward a verified version control system.
- In a block, we then store the root, the ultimate ancestor of all nodes, and store the changes elsewhere.
- This verifies the entire change tree.
Merkle
Tree Terms
- Connected
- Root
- Leaf
- Parent
- Child
- Height/depth
- Size
Connected
- Trees are connected
- That is, if we take the transitive closure of edges
- There is an edge from one node to every other node.
- The transitive closure of the edges is the set of paths.
- Connectedness is usually defined over undirected graphs.
Root
- The root is the node which has a path to every other edge.
- It is the “ancestor” of all nodes.
- Often in computing, we treat a descriptor or pointer to the root as a descriptor or pointer to the graph.
- A Bitcoin block contains a “Merkle root” of a transaction tree.
Leaf
- Not every node has outgoing edges.
- We refer to nodes without such edges as leafs (leaves?)
Diagram
Parent/Child
Height/Depth
Height
- The height of a node is what the height of the subtree of which it is a root would be.
- The node labelled
3is of height 2.
Depth
- The depth of a node is what the height of the subtree of which it is the lowest leaf would be.
- The node labelled
4is of depth 2.
Size
- The size is generally taken to be the number of nodes.
- It is separately, useful, sometimes, to count only lea[fs/ves].
Today
- ✓ Graphs
- ✓ DAGs
- ✓ Trees


