Theses
June 10, 2021
Although I graduated two years ago, I’d previously never made my bachelor’s or master’s theses public, although the code has always been available. I’ve not touched either project since, but the theses both provide missing context.
MicroJIT: A JIT compiler for the BBC micro:bit (2018; thesis; code). The micro:bit features a low-powered Cortex M0 processor. At the time the primary programming model for the micro:bit was to transfer a binary over USB and reboot the device. My approach instead allowed new programs, encoded in a stack machine-based bytecode format, to be transferred over serial and JIT-compiled to ARM assembly on device.
Stannel and Statick: A parallel embedded processor and a concurrent concatenative programming language (2019; thesis; code). Stannel was a low-powered processor that I design to run on Lattice Semiconductor’s FPGAs, modelled loosely on the Transputers of the 1980s. It was complemented by a custom assembly language and assembler.
Statick, a concatenative language with dependent, affine types, was arguably the more innovative part of the project. It’s concurrency primitives were based on CSP, but used the type system to restrict and prove operations on them. For instance, a channel could be restricted at compile time to “must use
N
times,” and the compiler would then statically prove it’s used received and sent onN
times before the program terminates (the compiler itself doesn’t prove the program won’t deadlock, but eliminates the need to manage the memory associated with the channel).The thesis won the “Microsoft Prize for best Computer Science Project” the year I graduated.
Thanks to some tight editing to get them under the 10,000 word limit, both theses came to exactly 9,989 words.
With two years since removed, I’m able to reflect on these projects a little more critically. MicroJIT wasn’t innovative as a JIT, and beyond function inlining, bounds check elimination, tail call optimisation, and register allocation it didn’t do much beyond basic assembly (although packing all that onto the micro:bit was a challenge). However, it built a solid grounding in ARM assembly and exposed me to a little more compiler implementation than we’d done as part of our courses.
I’m more proud of Statick (the programming language) than Stannel (the processor). Statick is a fun blend of ideas from a whole host of programming languages (Occam, Haskell, Rust, Forth, Cat, Kitten). Nobody should be writing large programs in it, but it serves as a pleasant proof that it’s possible to build a (quirky) programming language with these ideas assembled into it.
Stannel was my first foray into an intermediate-sized project in Verilog. There’s a lot that I’d do differently now, and would’ve done differently given a bit more time (make it register based, not stack based, fix the memory controller, deepen the pipeline). Regardless, working on Statick and Stannel was a pretty good set up for hacking on compilers for large, parallel architectures (which just so happens to be my day job now).
Most of my peers’ projects tended to be more theoretical, so I was glad I had the chance to marry a little bit of theory and practice (mostly thanks to Alex Rogers, my supervisor).
I’d argue combinatorial projects (like my master’s) are a good approach for most CS students. There’s a tendency at many universities for bachelor’s theses to take the form of “I implemented algorithm X
in programming language Y
and found Z
”, but often X
is just from one of their supervisor’s papers, Y
is whatever the learned the previous year, and Z
isn’t too surprising.1 Turning X
into a set forcibly exposes students to a wider set of ideas, and may ensure they’re building something genuinely new. By going off the beaten track they’ll learn a little more too. It’d reduce the duplication amongst submitted theses and better mirror the reality of modern research.
Sadly,
Y
is usually Java. My own bachelor’s thesis fits this box too.↩︎