Jun 9 '24

Rethinking writing files with memory mapping and C++

This post introduces the motivation behind the decodless C++ open source libraries. I hope someone finds them useful.

I’ll begin with an example. I do 3D graphics and triangle meshes are canonical in the field. There are so many file formats but a few just stick around because they’re so amazingly simple and easy to use. I’ll call out *.obj as a very well known one. It’s text based, which is great to directly play with and edit, but it’s relatively slow to parse — lots of atof() calls to read floating point numbers from text. Lets make our own binry format for a simple triangle mesh.

Example file format

As with most file formats, it’ll start with a header. For simplicity, this is just going to be a plain old data struct and we’ll assume a similar architecture machine both reads and writes the file. E.g. it must be little endian, but that’s very common.

struct Header {
    uint64_t triangle_indices_offset;  // bytes from start of file
    uint64_t triangle_indices_count;  // elements of uint32_t
    uint64_t vertex_positions_offset;  // bytes from start of file
    uint64_t vertex_positions_count;  // elements of float[3]
};

Nice and simple. Not even a magic file identifier at the top. Actually, I can see a nice way to factor the above a bit…

struct Span {
    uint64_t byte_offset;
    uint64_t element_count;
};

struct Header {
    Span triangle_indices;
    Span vertex_positions;
};

The existing way

Hmm, lets quickly search to see how we can write a file in a cross platform way:

  1. C fopen()/fwrite() from stdio.h
  2. C++ ofstream from fstream
  3. That’s it??

These are basically the same - writing a stream of data. We have to allocate some memory first, fill that memory, then copy that memory to disk using fwrite() or ofstream::write(). Lets write some code to write our file.

void writeHeader(std::ofstream& file, Header header);
Span writeIndices(std::ofstream& file, uint32_t* indices, uint64_t count);
Span writePositions(std::ofstream& file, float* positions, uint64_t count);

void test()
{
    uint32_t indices[] = { ... };
    float    positions[] = { ... };

    std::ofstream file("cube.mesh");

    Header header; 
    writeHeader(file, header);
    header.triangle_indices = writeIndices(file, indices, std::size(indices));
    header.vertex_positions = writeIndices(file, positions, std::size(positions) / 3);

    // crap. we wrote header before initializing the members
    //writeHeader(file, header); // no, the header should probably be first

    // Go back and re-write the header
    // TODO: clean up this mess!
    // maybe change the first writeHeader() to reserveHeader()?
    file.seekp(0);
    writeHeader(file, header);
}

Oh no. How can we fix the above? Yeah, yeah, we’ve all hit this problem before. You just™…

  1. Figure out how much data you’re going to use first
  2. Write the header at the end of the file
  3. Leave a placeholders and come back and fill them in afterwards

Lets take a moment to stop and think about more complex examples with these approaches. Currently there is just one mesh in the file. What if we want multiple meshes. That means that option (1.) would involve recursively walking the data structure you’re going to write twice. That’s pretty complicated to code and easy to go badly wrong if the predicted size is incorrect. (2.) is appealing because it’s simple but it’s also unexpected to have headers last. It also can’t solve the case where references to data form a graph and not a tree. (3.) is nicer for the code reading the file, although there’s some housekeeping to first reserve blocks of the file and return to write them later.

That’s it then. Problem solved. Unless… one more thing to investigate…

Writing files with memory mapping

This is pretty out there. Specialized applications have been doing this for a long time now. Maybe we can make it easier?

Memory mapping is a comlicated topic especially for someone that just wants to write a basic file. If there wasn’t a library (spoilers!) it’s probably not worth learning. But since you’re here already, what are the benefits?

You still have to write data to RAM and it still gets copied to disk, but…

  1. There is no std::ofstream overhead
  2. Nicer programming interface - write to a pointer instead of a stream

These both allow you to “map” the bytes of a file on disk to a virtual address range in your program’s memory. But what does this actually give you?

How does it work? RAM is given to processes in “pages” of bytes. Pages are mapped into processes’ virtual address space - this is what you get when you call malloc() or new. File memory mapping allows you to map these pages to a file on disk. The OS and CPU automatically handle keeping the data in sync. To the programmer it looks like they’re writing to regular memory but at the end, the OS copies modified pages to disk. Pages may be “paged out” automatically so you don’t need to have enough RAM for the whole file. Moreover, you can read from the mapped memory too!

So what’s the catch? Well, a few and I have solutions to all.

  1. Memory mapping is complicated to understand and use. As of writing there is no standard library for memory mapping, but thre are equivalent interfaces. Linux has mmap() and windows has CreateFileMappingA(). A cross platform library with a simple API could hide the comlexity.
  2. You typically need to create the file with a fixed size first, then memory map it. This means you’d need to know exactly how many bytes you want to write up-front, which IMO is a deal breaker. There are ways to grow memory mappings but that’s even more complicated. Again, just something that can be handled by a library.
  3. The stream APIs automatically increment a write pointer. When using memory mapping we’d need to replace this feature. Actually, linear memory allocators do exactly this, and there’s an amazing side effect that mapped memory is then natively readable. Lets have a library for that too.

Growable memory mapping

The first issue when writing files using memory mapping is figuring out how big to make the file to memory map. Do you do the maths, figure out the exact total file size? What about over-allocate the file and truncate afterwards? But how much more do you allocate? This needs real disk space. Maybe you could keep rewriting the same file and double the file size every time you ran out of space. All pretty bad options just because you have to pick a file size up front. What if you could bypass all that and grow the memory mapped file?

Growing is actually pretty simple. You just unmap the file, increase the file size, then remap it. The problem is that the re-mapped file may be at a different address, meaning header.triangle_indices = writeIndices(...) would be writing to a dangling pointer if the file size were increased. At this point things are not much sipler than the initial streams API, where you would file.seekp() to a file-relative address. What if you could guarantee you would get the same address when re-mapping a file?

The virtual address space in a process is fairly cheap. Who has 256TB of RAM anyway. What we want is a way to reserve a big chunk of address space — something so big we would really never hit the end. Then we could remap the file into that same address space after growing it - or just append pages if the OS supported it. It turns out this is entirely possible to do. How? The way just described.

With mmap() on linux, you can create an anonymous private mapping with PROT_NONE. That reserves the range. Then you can remap that range with MAP_FIXED.

Windows is much more frustrating. You can easily reserve a virtual address range but there is no way to remap a reserved memory range… unless you use an undocumented driver API calls from the WDK, NtExtendSection(). Thanks, RbMm from stackoverflow!

Now to making this easy to use. Enter decodeless::mappedfile for easy and growable memory mapping. Lets take a look at what the example code can look like now:

// These use and increment writePtr and may call file.resize() if needed
Header* writeHeader(decodeless::resizable_file& file, void*& writePtr);
Span writeIndices(decodeless::resizable_file& file, void*& writePtr, uint32_t* indices, uint64_t count);
Span writePositions(decodeless::resizable_file& file, void*& writePtr, float* positions, uint64_t count);

void test()
{
    uint32_t indices[] = { ... };
    float    positions[] = { ... };

    decodeless::resizable_file file("cube.mesh", 9999 /* max. size */);
    void* writePtr = file.data();

    Header* header writeHeader(file, writePtr);
    header->triangle_indices = writeIndices(file, writePtr, indices, std::size(indices));
    header->vertex_positions = writeIndices(file, writePtr, positions, std::size(positions) / 3);
}

Allocating data in a mapped file

In the above examples, writePtr is passed around to mark the next bit of unused file to write to. If we were to memory map and read the file directly, we need to consider alignment. I’ve assumed the write*() functions align writePtr before writing to it and returning addresses. This is just what an allocator like malloc() or new does. The only difference is we don’t care about an equivalent free()/delete because the data is going to a file for good. What we want is a linear allocator that gets its backing memory from a growable memory mapped file.

Enter decodeless::allocator, a basic linear allocator library that supports growing a backing decodeless::resizable_file. These classes are combined into decodeless::writer for convenience:

// With the pmr allocator, these functions don't need to know
// anything about decodeless or where they are writing their data.
Header* writeHeader(std::pmr::polymorphic_allocator alloc);
Span writeIndices(std::pmr::polymorphic_allocator alloc, uint32_t* indices, uint64_t count);
Span writePositions(std::pmr::polymorphic_allocator alloc, float* positions, uint64_t count);

void test()
{
    uint32_t indices[] = { ... };
    float    positions[] = { ... };

    decodeless::pmr_writer file("cube.mesh", 9999 /* max. size */);

    Header* header writeHeader(file.allocator());
    header->triangle_indices = writeIndices(file.allocator(), indices, std::size(indices));
    header->vertex_positions = writeIndices(file.allocator(), positions, std::size(positions) / 3);
}

Convenience utilities

There’s one final library to show off and that’s a relative pointer decodeless::offset_ptr. This can replace Span above. The only difference is that byte_offset is now an offset from the relative pointer’s own address in the file (or memory) and saves having to pass around a pointer to the start of the file. There are also some utility allocation methods that can replace our write*() functions for an even shorter demo.

struct vec3f { float x, y, z; }

Header {
    decodeless::offset_ptr<uint32_t> triangle_indices;
    decodeless::offset_ptr<vec3f> vertex_positions;
};

void test()
{
    uint32_t indices[] = { ... };
    vec3f    positions[] = { ... };

    decodeless::file_writer file("cube.mesh", 9999 /* max. size */);

    Header* header = file.create<Header>();
    header->triangle_indices = file.createArray(indices);
    header->vertex_positions = file.createArray(positions);
}

For the pièce de résistance, see how convenient reading the above file can be!

void test_read()
{
    decodeless::file file("cube.mesh");
    const Header* header reinterpret_cast<const Header*>(file.data());
    uint32_t firstIndex = header->triangle_indices[0];
    vec3f    firstIndexPosition = header->vertex_positions[firstIndex];
}

decodeless

TLDR: check out the decodeless library collection: github, twitter

There are no comments yet.