POSTS

On learning Rust

After 2 previous attempts, I finally managed to cram the Rust language into my head. It took several full days and reading the Rust Programming Language to grok what was going on, but I’m confident I’ll be getting the time investment back in the coming months.

Let’s talk (memory) safety

If you’re new to the subject, you’re probably wondering what the fuss about Rust is about. Just another language for code hipsters? Not quite. But part of explaining the appeal of Rust is first explaining the problems with many of the languages currently in use in computer programming. And for that, I’ll start with a topic you weren’t expecting: transportation safety.

The track record of automotive safety is abysmal - millions of people die a year from automotive collisions (it used to be way worse before seatbelt laws came into effect!). Millions of people dead a year would kick up an alarm, right? That would be treated as a major health crisis, right? Well, it’s not. Not in North America, anyways. But some folks in Sweden asked themselves if it was possible to design a road system where no one died, and came up with Vision Zero. It takes effort to implement it, but in places where it’s been put in place the changes are staggering. In the city of Oslo, Norway, only one person died on the roads in 2019. So it’s not that reducing traffic deaths is impossible, but it takes willpower and thoughtfulness in design (sometimes more money, but often not). It’s finding dissatisfaction with the status quo, and not saying “eh, looks good enough” to a design without thinking about the consequences.

So now let’s talk about memory safety. A lot of commercial code is written in memory unsafe languages like C and C++. The reasons for this are largely historical, and largely pragmatic. If you’re doing things with Unix-likes and Windows at the system level, you’re generally doing it with a C-like language. “Hold on there - I use Node.js, not C!” Well, the Node runtime is written in C/C++. Same with Python, same with Ruby. Oh sure, there’s alternate runtimes that aren’t written in C, but if you’re starting out with a language, it’s usually the reference C implementation.

Part of why C will be around forever is the same reason why PHP will be around forever, and that is the ability to get something working (for a certain definition of working) fairly quickly when you’re first starting with the language. With a dash of Stack Overflow and some quick googling, you can find out the right incantations to make your thing compile, and do something useful. Often this handles some unsexy-yet-somehow-critical part of a small-to-medium business. Larger businesses can afford to hire experts, and they release slightly less vulnerable code to the public.

The C compiler is forgiving of things, and often does what you expect. And if you spend a few years learning about undefined behaviour and compiler, memory leaks, reading books on secure C programming, you’re not going to write any serious bugs or security vulnerabilities. Well, maybe - sometimes bugs get produced because the developer didn’t use braces, or when folks have something else on their mind

C will allow you to compile and run this:

#include <stdio.h>

int main() {
  int c[3] = {0, 1, 2};
  printf("%d\n", c[1]);
  printf("%d\n", c[9]);
}

If you’ve got a decent compiler, it might even warn you that something is amiss.

arr.c:6:18: warning: array index 9 is past the end of the array (which contains 3 elements)
      [-Warray-bounds]
  printf("%d\n", c[9]);
                 ^ ~
arr.c:4:3: note: array 'c' declared here
  int c[3] = {0, 1, 2};
  ^
1 warning generated.

But hey, it compiled, and what’s a warning anyhow, right? Then you run it:

1
1782834133

That second value is garbage. This may seem like a toy example, but this plays out a lot more than you think. How do you make that previous warning go away?

#include <stdio.h>

int main() {
  int j = 9;
  int c[3] = {0, 1, 2};
  printf("%d\n", c[1]);
  printf("%d\n", c[j]);
}

Oof.

New kids in town

In the past decade several new languages have come out, that aim to deliver the performance of C-like languages, but without the “explodes-in-your-face-on-overflow” surprises that happen with C. In this pack, the front runners are Go from Google, Swift from Apple, and Rust. All three have their own merits and drawbacks, but instead of going into a bike-shed argument, let’s just say all three are good enough that we should be looking at using them to replace C codebases in the future. That I happened to choose Rust is personal preference, but besides the memory safety point I’m making.

Let’s try that earlier C program with Rust. A simple bit of misdirection gives the same garbage results in the executable output, but now no warning to tell you what’s up. Rust can be tricked in a similar way:

fn main() {
  let j = 9;
  let s = [0,1,2];
  println!("{}",s[1]);
  println!("{}",s[j]);
}

But will the program run?

1
thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 9', arr.rs:5:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

A crash can be bad, sure, but silent data corruption can be much worse. Just imagine a utilities bill that says you owe a pointer’s amount of dollars. How long do you think you’d be on the phone trying to sort that one out?

Rust kind of looks like C/C++, but it’s designed to avoid memory safety issues at the compiler level. It gives you access to the low-level goodies you need to drive a USB widget or make a fast web server, but won’t have you remembering to allocate and deallocate memory to do so. Given that most security bugs are classified as memory safety bugs, this is huge. This is the equivalent of Vision Zero for memory safety in systems programming.

Understanding and thinking

Civilization advances by extending the number of important operations which we can perform without thinking about them. –Alfred North Whitehead

So this all seems gravy, everyone move to Rust, right? Not quite.

In order to remove the worries of hand-managed memory allocation, languages typically resort to a garbage collector. Garbage collectors are getting pretty good these days, but the knock against them is that there’s a period of time where the program has to run the garbage collector. For extremely low-level and/or realtime stuff, that would be unacceptable. What makes Rust different is that instead of a garbage collector, it figures out the memory allocation at compile-time. This sounds perfectly fine, until you try it.

The problem (if you could call it that) is that Rust requires you to understand what the hell is going on. The compiler found a bug in your program, so you have to fix it. The compiler hints are great, but the code/compile feedback loop will suffer if you’re constantly pleading with the compiler to create an executable. To be effective at Rust and not throw your keyboard through the screen, you need to understand not just the syntactic details, but how Rust is working behind the scenes as well. You have to understand how the memory borrowing works. You have to understand how strings are constructed. You have to understand how generics and traits work. If you don’t, you’ll be convinced the compiler hates you on a deep and personal level.

So why take up a language that requires a big upfront investment in brain cycles, when there are other languages that are far more forgiving for newcomers? Because of the aformentioned garbage-collection issue, and because of long-term dividends. A program that doesn’t require frequent fixing because of preventable bugs means more time to work on new programs. Languages that enforce good practice by default scales well with larger teams. A language that can interoperate with C, and allow migration from problematic codebases, means we can start using these tools now to replace big, nasty legacy codebases.

I think Rust is simultaneously the worst and best way to introduce someone into programming. It’s the worst because if you don’t understand what it’s trying to do, you’ll be wondering why you’re spending time learning about how the compiler handles memory, while you watch your friends get kudos and praise whipping up a fart app in 20 minutes. It’s also probably the best, because you’ll learn a ton of transferable stuff about memory, strings, how compilers work, and more, and once you’ve been able to make fast and robust applications, you’ll wonder why people are still freeing memory by hand. Learning leads to understanding, and is a better use of your thinking power than “have I made sure that this string is freed in this function?”

My experience so far

At this point now I’ve made a few toy applications, a Gopher server, a Gopher terminal client (turns out Lynx can’t handle Gopher as well as I’d previously thought), and have some blinkenlights working on a little LED message board I dug out of the closet. I’m even rewriting my CoffeeOutside Twitter bot in Rust now, and I’ve been happy with the progress up to this point. Once I’m not terribly embarrassed by the code I’ll open-source it (but I’m not expecting anyone to find it useful ;-).

One minor gripe that’s not the fault of the Rust team is that the package repository called Crates has quite a few underbaked or outdated packages that squat on good package names. This is a common problem from the languages I’ve used though, and Cargo allows alternate registries and using git repositories, so no point whining about what are fundamentally human problems like “I had time to fix bugs when I registered the package, I don’t anymore.”

One minor gripe that is something that could be addressed by the team is the anemic standard library. Some languages proclaim this a virtue, but the more languages I learn, the more I appreciate not having to dig through 30 janky third-party repos to build a simple Twitter app. For example, Rust doesn’t maintain it’s own rand library, generally leaving that to the rand crate. Okay, that seems intuitive enough. Now where do I go for a JSON parsing library? The answer is the unintuitively-named serde crate (it’s short for serialization-deserialization). While there’s effort to make a Crate Cookbook, and an effort to clean up common crates, I think the happier middle ground is something like the “x” packages that Go provides - not part of the standard library, but polished and stable enough for normal use. Another approach is how Ruby decoupled the gems from the standard library, but the gem source code is still handled by the Ruby team.

So how is the level of quality and soundness across all these third-party crates I’m installing? No idea without auditing each of them, and there’s no way to enforce at the Cargo level things like “no unsafe code blocks in my dependencies, please.” Rust allows memory unsafe operations using an unsafe block - this is fine for very low-level (ie hardware) and FFI stuff, but for internet-facing traffic? No thanks, Tom Hanks. Further, there’s nothing in the registry that points out “this code has unsafe code blocks.” I’m not saying the unsafe blocks are evil or something, but not every programmer is going to understand the full repercussions of their code. The whole point of Rust is that you want to write memory-safe code, and while it does a great job of that, as long as it depends heavily on third-party code, it really does need to come up with something here. Fortunately there does appear to be a (not-well-documented) lint that you can throw in your codebase that at least denies unsafe blocks in the crate you’re working on.

// This will fail compilation
#![deny(unsafe_code)]

fn main() {
  unsafe {};
}

That can be handy for a crate being worked on by multiple people, but I worry that this will eventually lead to something like Perl, where you have to specify a whole bunch of stuff to get something resembling safety. If safety is the point, then something should be done that makes unsafety as visible as possible.

At the risk of sounding cliché, the future is now

Rust is being used for large codebases and is doing things that C is normally used for.

There are still arguments for using C, such as embedded hardware that requires compiler support that’s not handled in the new languages (yet). But the days of large-scale relevance for C are numbered, until it becomes something like COBOL: large, financially important codebases will be around, sure, but hiding in the closets of governments and corporations, left untouched for fear of breaking the beast. Once the new languages start replacing C/C++ in university courses, things are going to heat up quickly.

This overview has been very high-level, I know, but if you’re about to start work on a brand new project where before you would have reached for C, maybe spend a couple of weeks learning Rust or Go or Swift first. They’ll probably fit the bill just fine, and if they don’t, it’s likely only a matter of time before they do.