18 Aug 2025

In-memory Filesystems in Rust

I’ve been working on a CLI tool recently, and one of the things it does is manage files on disk. I have written a lot of file management tests for Bundler, and the two biggest reasons that the Bundler test suite is slow are exec and fstat. Knowing that, I thought I would try to get out ahead of the slow file stat problem by using an in-memory filesystem for testing.

A collaborator mentioned being happy with the Go package named Afero for this purpose, and so I set off to look for a Rust equivalent to Afero. Conceptually, I was hoping to be able to replace std::fs:: with mycli::fs:: and swap out the backend in tests for something that’s completely in-memory so I don’t have to spend time waiting for syscalls.

Unfortunately, based on my searching, not only is there nothing like Afero, but simply asking about it gets you a lecture about how such things aren’t necessary in Rust. Somewhat frustrated, I continued searching and eventually found a few options to try.

First, I discovered the vfs crate, whose documentation seems pretty promising. It’s possible to swap out vfs backends to get a real filesystem, an in-memory filesystem, a new filesystem scoped into a directory, or files embedded in an executable. It’s actively maintained, and seems to have a decent number of current users. Unfortunately, as I got further along, it became apparent that the vfs crate isn’t actually a viable alternative to interacting directly with the filesystem.

The vfs crate doesn’t have any support for symlinks, so resolving symlinks means going back to std::fs after all, and having to write special-cased symlink resolution code that doesn’t run if the filesystem is vfs. The real killer, though, was that vfs doesn’t contain any support for the concept of file permissions. Because of that, it’s impossible to write executables, which is core functionality for my tool.

It turns out the intended primary use case of the crate is to store files inside Rust binaries but still have an API sort of like the filesystem API to interact with them. Unfortunately, that information is hidden away in a comment on a random GitHub issue, rather than included in the project readme. At that point, I realized I was probably not going to be able to use vfs and also build the tool I wanted to build.

Next, I looked at rsfs, which is a little bit older, and seems sort of unmaintained, but explicitly says that it aims to reproduce the functionality of std::fs, while adding the ability to run the filesystem in memory if desired. Unfortunately, the design of the rsfs crate means that every function that talks to the filesystem now has to be parameterized against the rsfs::FS type, and that makes the type signatures of every function suddenly much worse.

After doing an experimental port of my initial codebase to rsfs, the gnarly type situation drove me to actually test to see what the advantages were of using rsfs to run tests fully in-memory. If the advantages were big enough, I would suck up the types and deal, but I didn’t want to make all my types worse if there wasn’t actually a payoff.

That’s when things got weird.

The first weirdness was trying an initial benchmark comparing the vfs implementation to a std::fs implementation. My naive “just run cargo test to benchmark” strategy seemingly paid off, with vfs taking around 850ms and std::fs taking around 1200ms. That seems pretty significant, right?

Giving away the conclusion in advance, I am sadly forced to admit that it was not actually significant. While that high level speedup from using vfs never went away, I was eventually forced to conclude it was a difference in linker cache, or an artifact of cargo test running many executables, or something else entirely.

Trying to hunt down exactly what was faster about using vfs, I started to pick apart the test executables generated by cargo, and I found something even more confusing. Using hyperfine, I was able to benchmark a single cargo test executable that ran through most of the filesystem calls in my code.

With vfs providing an in-memory filesystem, the tests benchmarked as taking about 45ms. With rsfs providing an in-memory filesystem, the tests benchmarked as taking… about 45ms. With rsfs providing the regular filesystem, the tests benchmarked as taking… also about 45ms.

Starting to feel even more confused, I set up additional tests using std::fs running against a ramdisk, and got a high accuracy benchmark result of… 45ms. At that point, I figured I needed to include the completely regular std::fs pointed directly at my regular SSD. That also took 45ms.

At this point, I can only assume that modern SSDs (and macOS filesystem cache) work so effectively together that there is effectively zero performance to be gained by making the file-related syscalls virtual? That doesn’t really mesh with my understanding of how expensive syscalls are vs function calls into a fake in-memory filesystem, but all my benchmarks seem to disagree.

If you have examples of performance differences from using an in-memory filesystem in Rust, please let me know the details!

In the meantime, it seems like modern SSDs (and modern OS filesystem caches) are so fast that it doesn’t even matter. Eat trash, be free, test directly against the filesystem. Why not.