Why scientists are turning to Rust
Despite having a steep learning curve, the programming language offers speed and safety.
Published in "Nature", dec.2020
In 2015, bioinformatician Johannes Köster was what he called “kind of a full-time Python guy”. He had already written one popular tool — the workflow manager Snakemake — in the programming language. Now he was contemplating a project that required a level of computational performance that Python simply couldn’t deliver. So he began casting about for something new.
Köster, now at the University of Duisburg-Essen in Germany, was looking for a language that offered the “expressiveness” of Python but the speed of languages such as C and C++. In other words, “a high-performance language that is still, let’s say, ergonomic to use”, he explains. What he found was Rust.
First created in 2006 by Graydon Hoare as a side project while working at browser-developer Mozilla, headquartered in Mountain View, California, Rust blends the performance of languages such as C++ with friendlier syntax, a focus on code safety and a well-engineered set of tools that simplify development. Portions of Mozilla’s Firefox browser are written in Rust, and developers at Microsoft are reportedly using it to recode parts of the Windows operating system. The annual Stack Overflow Developer Survey, which this year polled nearly 65,000 programmers, has ranked Rust as the “most loved” programming language for 5 years running. The code-sharing site GitHub says Rust was the second-fastest-growing language on the platform in 2019, up 235% from the previous year.
Scientists, too, are turning to Rust. Köster, for instance, used it to create an application, called Varlociraptor, that compares millions of sequence reads against billions of genetic bases to identify genomic variants. “This is huge data,” he says. “So that needs to be as fast as possible.” But that power comes at a cost: the Rust learning curve is steep.
“It does take some up-front time,” says Carol Nichols, a member of the Rust core team and founder of the consultancy firm Integer 32 in Pittsburgh, Pennsylvania. “But it has enabled me to do things that I wouldn’t otherwise be able to do. I see that time as well spent.”
Caution: no guide rails Workflows for analysing scientific data tend to use languages such as Python, R and Matlab. These interpret lines of code one by one and then execute them, a style of programming that is good for exploring data, but not at speed.
C and C++ are fast, but they have “no guide rails”, says Ashley Hauck, a Rust programmer (or ‘Rustacean’, as community members are known) in Stockholm. For instance, there are no controls that stop a C or C++ programmer from inappropriately accessing memory that has already been released back to the operating system, or to prevent the program from releasing the same piece twice. In the best-case scenario, this would cause the program to crash. But it can also return meaningless data or expose security vulnerabilities. According to researchers at Microsoft, 70% of the security bugs that the company fixes each year relate to memory safety.
Memory rules Rust’s model uses rules to assign each piece of memory to a single owner and enforce who can access it. Code that violates those rules never gets the chance to crash — it won’t compile. “They have a memory-management system that is based on this concept of lifetimes that lets the compiler track at compile-time when memory is allocated, when it’s freed, who owns it, who can access it,” explains Rob Patro, a computational biologist at the University of Maryland, College Park. “There’s an entire large class of correctness errors that go away simply by virtue of the way the language is designed.”
The same guarantees help to ensure that parallelized code — software written to run on multiple processors — can run safely, for instance by eliminating the possibility that multiple computational threads will access the same data at the same time.
The result is a language that is easier to maintain and debug, but harder to learn. “No other mainstream languages really have these concepts, and they’re really core to understanding a lot of how you have to write code in Rust,” Nichols says. Stephan Hügel, who studies the visualization of geographical data at Trinity College Dublin, estimates that he spent two or three months porting a Python algorithm for converting geospatial coordinates from one reference system into another into Rust, achieving fourfold faster execution. Richard Apodaca, founder of the cheminformatic-software company Metamolecular in La Jolla, California, says it took him about six months to become proficient in the language.
Focus on usability To compensate, Rust’s developers have optimized the user experience, says Manish Goregaokar, who leads the Rust developer-tooling team and is based in Berkeley, California. For instance, the compiler produces particularly informative error messages, even highlighting offending code and suggesting how to fix it. “If your language is going to introduce a novel concept, it had better be pleasant to work with,” Goregaokar explains.
The Rust community also provides extensive documentation and online help, including a popular online reference called the Book and a ‘Cookbook’ of recipes for solving common problems. Users praise the Rust toolchain — the applications that programmers use to turn code into applications (see ‘Let’s get oxidizing’). “The tooling and infrastructure around Rust is really phenomenal,” Patro says. Unlike the many compilers and ancillary utilities that programmers use to build C code, Rustaceans can use a single tool, called Cargo, to compile Rust code, run tests, auto-generate documentation, upload a package to a repository and more. It also downloads and installs third-party packages automatically. A Cargo plug-in called Clippy flags common errors and ‘non-idiomatic’ Rust code, a feature that Patro calls “absolutely phenomenal”.