Bits
Systems in Rust
Announcements
- Action Items:
- Wordle for Friday
- “Macros” assignment ready after class.
Today
- Encoding
- Bytes
- Bitwise operators
- Endianness
Bytes
“Good at”
- Computers “are good at” storing numerical data.
- At a high level numerical computing
- NumPy / Numba
- Julia
- Fortran
- At a high level numerical computing
- These languages excel at wide ranges of mathematical operations, ranging from addition to solving systems of differential equations.
“Good at”
- Electronics are good at propagating signals, which we often view as ordered collections of zeros and ones.
- At a lower level processing
- C and to a lesser degree C++
- Assembly, like AMD/Intel x86 or IBM PowerPC
- Rust
- At a lower level processing
- These languages often have weaker support for say, trigonometry and better support for reading files of arbitrary types, including not just .csv files but e.g. readings from radio telescopes.
Low-level languages
- Rust is among the group of languages good at working close to the “hardware-software interface”
- Many design in decisions in Rust, such as in how Rust read file data
/dev/random
, reflect this:
- Direct references to bytes (or bits) are uncommon but present in higher level languages, and increase in popularity in languages that give more control to programs over how the device operates.
Bit Manipulation
- One common application of lower-level programming is to precisely control the values of individual bits within a byte.
Note
A byte is a hardware idea - an ordered collection of 8 bits, which themselves are hardware units either containing, or lacking, some electrical charge. We view them as storing zeros or ones.
- Importantly, while we can refer to bits or bytes in software, versus types like integers or strings, we are describing actual physical phenomena rather than mathematical abstractions.
Support
- Low level support is provided most commonly through two features:
- A type to represent a byte or bytes.
- Bitwise operators - operators like the arithmetic infix operators
+
or-
but that perform operations on pairs of bits rather than pairs of numeric values.
- We explore each in Rust.
Byte(s)
- Bytes are referenced by name within in Rust
std
in a few places, though there is no formalbyte
type. - Instead,
u8
is used.- This is common - there is little meaningful difference between 8 bits pretending to be a number and 8 bits that aren’t doing anything in particular.
Bytes
- Bytes are referenced, both as the function
bytes
and the data structureBytes
- The function produces a Rust iterator the produces
u8
values.
- The function produces a Rust iterator the produces
- In other cases, such as
as_bytes
, an array ofu8
is often used (expressed as[u8]
). - In any case, collections of
u8
s.
Example
- Really, there is no better example than just grabbing some random bits from
/dev/random
and taking a peek. - Rust helpfully supports printing in binary, using the
b
format code, to see the individual zeroes and ones of a byte or bytes.
Test it
- You can run a few times to see what a random bitstring (ordered collection of bits) looks like when using
:b
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
1000111011010011001001000110100010011010101101010111100001010101
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
1010101000100001011000111011111111011110010001010010101000111010
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
1001110000001001110011100111110010010101110000010111010111110011
Simplify
- To introduce operators, we’ll use only single
u8
s, just for simplicity.
let mut devrnd = std::fs::File::open("/dev/urandom").unwrap();
let mut buffer = [0u8; 2];
std::io::Read::read_exact(&mut devrnd, &mut buffer).unwrap();
let x = buffer[0];
let y = buffer[1];
println!("x = {x:08b}, y = {y:08b}");
- By default,
:b
doesn’t render leading zeros, which is fine for numbers but not great for bytes. - Use
:08b
- where0
specifies to have leading zeros, and8
specifies how many digits.
Fixed
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
x = 00101010, y = 01100111
- With the benefit of some bits, we can explore operators.
Bitwise Operators
Operator Reference
- Rust doesn’t clearly delineate bitwise from other operators.
- This is pretty reasonable: “bitwise” is something of a social convention.
- The overall operator reference I follow is Rust Book Appendix B
- The next slide lifts a subset of a .md table on that page.
- I have included both named bitwise operators and also “shifts”, which are a bit (heh) different.
Bitwise
Operator | Example | Explanation |
---|---|---|
! |
!expr |
Bitwise or logical complement |
& |
expr & expr |
Bitwise AND |
<< |
expr << expr |
Left-shift |
>> |
expr >> expr |
Right-shift |
^ |
expr ^ expr |
Bitwise exclusive OR |
| |
expr | expr |
Bitwise OR |
Conceptual Model
- One way to model how the bitwise operators is to:
- Imagine a byte as an ordered collection of bits
- Like a list, array, or vector
- Imagine a bit as a boolean.
- Zero/one is equally meaningful to False/True
- Loop a logical operator over the booleans.
- Imagine a byte as an ordered collection of bits
- We will demonstrate first in Python then in Rust.
Python Demo - Bytes
- We create two bytes using np.uint8.
- Binary representation using
bin
or:b
- You’ll see
x = 00001111, y = 00110011
Python Bit XOR
- We can use Pythonic “Bitwise exclusive OR”
- What do we see?
- Why?
Boolean arrays
- Convert the bytes to arrays of booleans.
- Via string representation:
- Via numeric representation, which requires a reversal:
Logical XOR
- Python lacks logical XOR as it is equivalent to
!=
# Print a truth table in markdown
print("| x | y | x!=y| x^y |")
print("|-----|-----|-----|-----|")
for left in (True, False):
for rite in (True, False):
print('|{:<5}|{:<5}|{:<5}|{:<5}|'.format(*[str(b) for b in [left, rite, left != rite, left ^ rite]]))
- Horizontally aligned:
| x | y | x!=y| x^y |
|-----|-----|-----|-----|
|True |True |False|False|
|True |False|True |True |
|False|True |True |True | |False|False|False|False|
Map XOR
- Apply XOR to each element of the boolean arrays.
- We would get the same numerical presentation.
- Both will yield the same boolean array.
All operators
- We can loop over all operators to inspect them visually.
from operator import and_, or_ , xor
for op in [and_, or_ , xor]:
print('f({:08b}, {:08b}) = {:08b} for f ='.format(x, y, op(x,y)), op)
not
is similar but unary.
- So:
Rust operators
- More common convention to use
!
for not (vs.~
) - I’m not great with Rust format codes yet, so a bit silly.
src/main.rs
fn main() {
let mut devrnd = std::fs::File::open("/dev/urandom").unwrap();
let mut buffer = [0u8; 2];
std::io::Read::read_exact(&mut devrnd, &mut buffer).unwrap();
let x = buffer[0];
let y = buffer[1];
let mut z = x | y;
println!("x = {x:08b}, y = {y:08b}, x | y = {z:08b}");
z = x ^ y;
println!("x = {x:08b}, y = {y:08b}, x ^ y = {z:08b}");
z = x & y;
println!("x = {x:08b}, y = {y:08b}, x & y = {z:08b}");
let y = !buffer[0];
println!("x = {x:08b},!x = {y:08b}");
}
Run it
- See how different random bits are combined!
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
x = 01110000, y = 10001110, x | y = 11111110
x = 01110000, y = 10001110, x ^ y = 11111110
x = 01110000, y = 10001110, x & y = 00000000
x = 01110000,!x = 10001111
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
x = 10111111, y = 00110110, x | y = 10111111
x = 10111111, y = 00110110, x ^ y = 10001001
x = 10111111, y = 00110110, x & y = 00110110
x = 10111111,!x = 01000000
Endianness
Why?
- Imagine thinking in digits.
- To store a ~3 digit number, need \(3 \times 10 = 30\) “things” that can hold a number.
Binary
- We are familiar with binary encoding.
- We note: \(\log_2(999) \lt 10\)
- Decimal encoding squanders \(\dfrac{2}{3}\) of it’s storage space.
- So we store in binary.
Binary
- The benefit here is that is sufficient to note the presence or absence of a digit (
1
) - Versus the specific digit \(\in [1,9]\) and presence of absense (
0
)
Wait a minute.
- Why is the ones digit (\(n^0\)) leftmost ?
- The “least significant bit” or LSB
- In e.g. English place the LSB last.
- CS 271 ⇒
Languages
- In e.g. spoken word, makes sense to lead with the biggest value.
- 200 of something is closer to 299 of something than 0 of something is to 9 of something.
- In e.g. programming, we often lead with lowest numerical index.
- We look at the
arr[0]
of something before thearr[1]
of something.
- We look at the
In Practice
- We end up with
- Confusing!
- Recall:
Annoyance
- This gets very annoying when trying to move numbers around that don’t quite fit in some number of bits.
- Say I have
0x1234
followers on Instagram📷 - Boycott Meta etc etc.
- And/or follow me @calvinallegedly
- Say I have
- Also imagine it is 1969 and you only have 8 bit integers.
- Only 2 digits in hexadecimal representation.
Stdin
- We can use
stdin
andread_line
to read in some values. - Recall:
Test
- Test it
src/main.rs
cargo run
-> type some nonsense -> press ENTER
Hex reader
- To read in hexadecimal values we must:
- Read in the string.
- Break into chunks.
- Convert 2 character chunks to numerical values
To characters
- We’ll just iterate over the characters and print them in pairs.
Test it
- Works sometimes better than others
$ cargo run
Compiling scratch v0.1.0 (/home/user/tmp/scratch)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
Running `target/debug/scratch`
0x1234
0x
12
34
$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
0x123
0x
12
3
Missing pieces
- We need to throw out the prefix
- We need to make sure we begin, rather than end, on a possible singleton.
- Here’s the plan:
- Skip the first two
- Traverse to the end of string.
- Build doubles back up.
Skip
- We can apply
next()
to an iterator to read characters. - It’s basically a pop operation.
Build up
- From there we just need to get pairs somehow.
- It’s not too bad to get the last character…
Uh oh!
- When looping over an iterator, we get the type itself.
- The loop just terminates when it has no more elements.
- When calling
.next()
, we get an option.- There is not always a
.next()
element.
- There is not always a
Recurse
fn last(mut cs:std::str::Chars) -> Option<char> {
return match cs.next() {
Some(n) => match last(cs) { // if there's digits here, look for more
Some(m) => Some(m), // if the last is found, return it
None => Some(n), // no more letters - we are last, return n
},
None => None, // required by `rustc` in case we don't input anything
}
}
Try it out!
$ cargo run
Compiling scratch v0.1.0 (/home/user/tmp/scratch)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.22s
Running `target/debug/scratch`
0x1234
Some('\n')
- Whoops! Forgot the newline… match
Some('\n')
toNone
Pair it up
- Let’s find the last two values.
- We already found the last one, just now if we’ve found it, pair it up and print it.
fn last(mut cs:std::str::Chars) -> Option<char> {
return match cs.next() {
Some(n) => match last(cs) { // if there's digits here, look for more
Some(m) => {
println!("{n}{m}"); // If last is found, print this then last
Some(m), // if the last is found, return it
None => Some(n), // no more letters - we are last, return n
},
None => None, // required by `rustc` in case we don't input anything
}
}
Whoops!
- We print every digit paired with the last one!
$ cargo run
Compiling scratch v0.1.0 (/home/user/tmp/scratch)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
Running `target/debug/scratch`
0x1234
34
24
14
Some('4')
- We can:
- Remove the print from
main
- Return
None
after printing a pair.
- Remove the print from
Altogether
- Rename
last
topairs
- Print single digits in
main
- Use
if let
- kind of a single pattern match.
- Use
src/main.rs
fn pairs(mut cs:std::str::Chars) -> Option<char> {
return match cs.next() {
Some('\n') => None,
Some(n) => match pairs(cs) {
Some(m) => {
println!("{n}{m}");
None
}
None => Some(n),
},
None => None,
}
}
fn main() {
let mut guess = String::new();
std::io::stdin().read_line(&mut guess).unwrap();
let mut cs = guess.chars();
cs.next();
cs.next();
if let Some(n) = pairs(cs) {
println!("{n}");
}
}
Just parse {n}{m}
- The built-in is
from_str_radix
for e.g.u8
.- Turn characters into singleton strings..
- (We could pattern match
'1' => 1
instead)
Bundle into a vector
- We introduce vectors.
- So close to a Pythonic
<class 'list'>
I can’t tell the difference.
- So close to a Pythonic
- Vs. printing, toss numeric values into a vectors.
- This is slightly complicated by the fact that, uh…
- There’s no vector there!
- Can’t just return it (because we’re using recursion #based)
Create vectors
- True to form in Rust, vectors require specific initialization as they are data structure.
- In this way they are like big S
String
s withString::new()
- In this way they are like big S
- We can use
Vec::new()
or the macrovec!
. Read more
Sketch
- Let’s convert
pairs
to a helper function. - Let’s move the odd case handlying from main to a wrapper function.
- And also popping the first two vals.
- Let’s initialize a mutable vector in the wrapper.
- We’ll maintain the return type of
pairs
but add an argument. - We’ll return the vector from the wrapper.
Wrapper
fn chars_to_vec(mut cs:std::str::Chars) -> Vec<u8> {
let mut vals = Vec::new();
// from old main
cs.next();
cs.next();
if let Some(n) = pairs(cs, &mut vals) {
vals.push(two_hex('0', n));
}
// "Prints and returns the value of a given
// expression for quick and dirty debugging."
dbg!(&vals);
return vals;
}
- Create a vector.
- Let pairs “borrow” the vector mutably - so it can add stuff.
- Let
dbg!
“borrow” the vector to print it. - Return the vector when you’re done!
Vector handling
fn pairs(mut cs:std::str::Chars, vals:&mut Vec<u8>) -> Option<char> {
return match cs.next() {
Some('\n') => None,
Some(n) => match pairs(cs, vals) {
Some(m) => {
vals.push(two_hex(n,m)); // only change
None
},
None => Some(n),
},
None => None,
}
}
- Add a “borrowed mutable vector argument”
- Push some
u8
s into it.
How do we feel about this?
Quoth Wikipedia
- We can regard our vector as a 32 bit word of 4 bytes.
- We can regard the indices (
0..4
) as the addresses in computer memory.
Two endians
A big-endian system stores the most significant byte (MSB) of a word at the smallest memory address and the least significant byte (LSB) at the largest.
A little-endian system, in contrast, stores the least-significant byte (LSB) at the smallest address.
- In
0x01020304
, the MSB is1
, and LSB is4
[4, 3, 2, 1]
, the smallest memory address contains a4
.
Arrays to Numbers
- We can reverse the operation by:
- Looping over the array.
- Multiplying by the numerical base, some power of two
- Adding up the different values.
Try it out
- Remember to update
main
$ cargo r
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/scratch`
0x01020304
[src/main.rs:10:5] &vals = [
4,
3,
2,
1,
]
[src/main.rs:38:9] (i, vals[i]) = (
0,
4,
)
[src/main.rs:38:9] (i, vals[i]) = (
1,
3,
)
[src/main.rs:38:9] (i, vals[i]) = (
2,
2,
)
[src/main.rs:38:9] (i, vals[i]) = (
3,
1,
)
1020304
Wait a minute
- Isn’t this terrible?
- There must be a better way.
Shifts
- The remaining bitwise operator is the shift
- Bitwise because it works on bits, but
- Takes two arguments:
- The bits to “shift”
- A numerical value encoding how much “shift”
Script it
- Python shift is
<<
(well, left shift)
- I got:
0 001 00000001
1 002 00000010
2 004 00000100
3 008 00001000
4 016 00010000
5 032 00100000
6 064 01000000 7 128 10000000
Shift it
fn custom_u8s_to_u32(vals : Vec<u8>) -> u32 {
// We'll just take the first four for now
let mut ret : u32 = 0;
for i in 0..4 {
dbg!((i, vals[i]));
ret += (vals[i] as u32) << (8 * i);
}
return ret;
}
- Much better!
- Saves an
as u32
!
Today
- Encoding
- Bytes
- Bitwise operators
- Endianness
Bonus
- Courtesy Indi of CS Student Assoc.
Bonus
- Courtesy Indi of CS Student Assoc.