The past couple of days, I have started actually playing around with programming in Rust.
If you want to play along at home, here is how you can get set up.
The language updates periodically, and there are several features that currently live on the nightly branch. The tool offered to manage this is rustup, which is your “toolchain manager”. This means it’s responsible for pulling down different versions of your core dev stuff – compiler, package manager, and the rest.
Since I work in a .NET shop, I’m using the new Visual Studio Code, which has a rust extension. It provides syntax highlighting and some basic code completion, but it’s far, far short of what I get with Visual Studio and C#. This is one of the areas that’s expected to improve this year, but there’s plenty of room to improve.
Doing the work
Since this is just playing around, I decided to redo a small utility that I’ve already written in C#. It scans a directory for all executables, then outputs the path (relative to the starting directory) and its MD5 checksum either to the console or to a CSV file, depending on command line arguments. The C# application is quick-and-dirty, but I did try a few optimization tricks – arrays of structs, splitting the work across multiple workers to compute hashes in parallel.
For my Rust version, I decided to start simple and do a single-threaded application. Rust is not “batteries included”, so I had to pull in almost everything I’d use a standard library for elsewhere – the MD5 hasher, the glob library to search directories recursively, a library for handling timing how the application runs, and even command line argument parsing all come from “crates”, which are third-party (or in some cases first-party-but-still-in-trial-phase) libraries that are pulled in via the package manager.
The pain points
Contrary to popular wisdom, I have not had to struggle much with the borrow checker – the set of rules applied by the compiler to guarantee that I’m not going to share the same reference out several times and have one part of the code start tripping up another. For a project as linear as this one right now, that doesn’t surprise me. Adding parallelism later on will no doubt make it more interesting.
My problem has been dealing with types. First off, it’s good that Rust’s documentation is as good as it is (and that the crates are all open source), because it has made me keenly aware of how much I rely on my accumulated knowledge of the .NET framework as well as quick and accurate code completion. Most of my time has been spent groping around in the dark trying to figure out what type or function I even need.
The other hurdle I hit is, literally, types. For example: I wish to abstract over “write to console” and “write to file”. In C#, this is easy because I can just get a Stream from either the console or a file, and then write to it. Base classes and interfaces form a hierarchy that lets me abstract over the details. Rust doesn’t let me get away with that, and here’s why.
First, Rust doesn’t have object inheritance, so there’s no type hierarchy in the sense that I am used to. Second, instead of interfaces, it has a similar concept called Traits (probably closer to C++ Concepts, but I’m saying this as someone who has barely done C++). The trick is that while there is a useful Trait to abstract behavior (in this case std::io::Write), I can’t go about just saying that I have some instance of std::io::Write. That’s because I’m trying to hold onto some concrete type that Writes, and the console (std::io::Stdout) and a file (std::fs::File) are two different concrete types. I can’t have some space on the stack (or can I? More in a later paragraph!) and have it be one of the two, don’t care which, just pretend that it Writes. The quick way to fix this is to turn them into a Trait Object – stick the real thing on the heap (Box in Rust parlance, new or any kind of smart pointer in C++, reference objects in .NET), and keep the pointer type as the trait. Once I have a Box<std::io::Write>, I can put anything that implements the trait behind it, and through magic auto-dereferencing, pretend I have the original object at hand.
The other trick, which dummy me only thought of driving home, is to use rust’s enum. Rust enums are much smarter than the enums you’re used to, which are numbers given fancy names. Rust enums are also called “algebraic data types” or “sum types” and the reason is that each variant of an enum is not just a named alternative, but it can have its own data-carrying type. For example, Rust doesn’t allow null pointers – the way you signal that you may or may not have some object of type T is to use the enum Option<T>, where you either have Some(T) or None. There are no thrown exceptions, so operations that can fail are modeled as the enum Result<T, E> and the action was either Ok(T) or Err(E), where both the data type T and the error type E are defined in the operation itself.
What I could have done was define my own Output enum, and given it types like Console(std::io::Stdout) and CsvFile(std::fs::File). For added trickiness, I could then implement the Write trait on my new type, and just forward everything along to the thing I actually have. Bam, one more pointer indirection done away with.
As of this afternoon I have the basic program behavior done. The next thing will be trying to parallelize hashing files, and then seeing how I can do performance-wise against my .NET version.