Les Imbroglios d'Alexis Breust

De l'impro sans bugs et du code lâcher-prise.

Aft design pattern: pimpl with no p

I was sure that the pimpl design pattern was an interesting thing. It was made so that one can hide compile-time dependencies to the end-user, allowing faster compile times.

However, as you know, pimpl starts with a p. And that's the big issue: pointers. So having a nice API implies memory fragmentation and cache misses? Surely, no!

Let me present you the Aft pattern, a pimpl with no p!

Important notice

As the reddit user simonask_ said right here, please do not use this pattern as I presented. Consider this article has an example of an experimental design on what not to do! ;)

Game Engine: performance results

Working on lava, I had implemented bucket allocators for the meshes and the materials so that whenever the user wants a new mesh, it will be constructed in a preallocated memory, keeping the update loops of the meshes cache-coherent.

However, doing so did not led to the big improvement I expected on big scenes. Mesh had a pointer to a Mesh::Impl. Allocating all the memory on the custom allocators for the impls was not enough because some part of the higher-level code was iterating on all the Mesh, making an avoidable deferencing.

Abstraction has a performance cost and wanted to shrink that to the minimum. Hence, I came up with the Aft pattern, which is basically a pointer to the implementation (pimpl) with no pointer.

Before After
5.2ms 4.9ms

Time spent updating all states and recording all command buffers,
before and after switching to Aft pattern for
Mesh, Material, Texture, Light, and Scene.

As you can see, I grabbed some CPU time doing these changes! Let's see how it works…

From pimpl to aft

The word aft refers to rear of a boat, the opposite being fore. I just used this term to have a small variable name with the same meaning that backend or impl.

Here's how traditional pimpl goes:

Structure of pimpl: the implementation details are hidden behind a pointer,
meaning a lot a dereferencing.

Thanks to the pointer, MeshImpl can be forward declared. But this implies a pointer derefencing each time we try to call anything backend-dependent from Mesh.

The pointer being the annoying part, one can simply try:

Structure of composition: as the compiler needs to know the size of MeshDetails,
the class has to be visible to the end-user.

The good part with the composition method is that details are encapsulated behind a object that can be private. The data is also fully located within the Mesh, ensuring good cache coherency between the implementation and its public API.

The big problem, however, is that MeshDetails needs to be known whenever the end-user wants to use a Mesh. Which means more compilation time due to all the headers necessary to know MeshDetails and, with that, dependencies to the backend developement files. (Forcing the user to have vulkan's headers or so just to rotate a mesh is a bit sad.)

The aft pattern attemps to fix that problem, though:

Structure of aft: the implementation details are hidden behind an internal array of bytes,
so the backend can access all the info it needs by a simple cast.

By saying that we want the data within Mesh but not the complexity of what's exactly inside, we come up with this pattern. One just need the size of MeshAft to be pre-computed. Afterwards, to use the data as a MeshAft, a simple reinterpret_cast just do it.

If your public API allows for multiple different implementations at runtime, one can just take the maximum of the sizes of all the different backends and use that as the size of the array of bytes.

Smart eyes might notice that sizeof(MeshAft) needs to know MeshAft at compile time, not fixing anything, then. And that's right: you'll need to pre-compute that value. Either use a script that update that value whenever needed to a generated file defining SIZE_OF_MESH_AFT or do use sizeof(MeshAft) while developing and update that value on release.

Let's have a look to some C++ code before testing if compiler handle that well.

Mock-up code

// Pre-computed sizeof(MeshAft)
#define SIZE_OF_MESH_AFT 12

// Forward declaration.
class MeshAft;

// The Fore of the pattern: the user API.
class Mesh {
public:
    Mesh();
    ~Mesh();

    void position(float x, float y, float z);

private:
    MeshAft& aft() { return reinterpret_cast<MeshAft&>(m_aft); }

    uint8_t m_aft[SIZE_OF_MESH_AFT];
};

include/mesh.hpp

#include "./mesh.hpp"

// Placement new only exists within this header
#include <new>

// Could be #ifdef with different technologies.
#include "./mesh-aft.hpp"

Mesh::Mesh() {
    // Construct the Aft in-place within us.
    new (&aft()) MeshAft();
}

Mesh::~Mesh() {
    // Call the Aft destructor because it won't be called
    // by the compiler as m_aft is just bytes.
    aft().~MeshAft();
}

void Mesh::position(float x, float y, float z) {
    aft().position(x, y, z);
}

source/mesh.cpp

// All our implementation details.
class MeshAft {
public:
    void position(float x, float y, float z) {
        m_position[0] = x;
        m_position[1] = y;
        m_position[2] = z;
    }

private:
    /* Whatever the end-user does not need to know
       (our storage structures and such). */
    float m_position[3];
};

source/mesh-aft.hpp

Examining generated assembly

Compiling mesh.cpp with g++ -O3 -S -c mesh.cpp -o mesh.asm, and extracting the position method code, we see:

_ZN4Mesh8positionEfff:
.LFB79:
    .cfi_startproc
    movss   %xmm0, (%rdi)
    movss   %xmm1, 4(%rdi)
    movss   %xmm2, 8(%rdi)
    ret
    .cfi_endproc

mesh.asm extract of (aft) Mesh::position

This code simply says: "Hey take these arguments and put them offset 0, 4 and 8 of myself, thanks!". Because the aft is basically Mesh itself. Note that there is no complex call of function because the code of MeshAft::position has been inlined here.

Doing the exact same test with our class storing a pimpl instead, we get:

_ZN4Mesh8positionEfff:
.LFB78:
    .cfi_startproc
    movq    (%rdi), %rax
    movss   %xmm0, (%rax)
    movss   %xmm1, 4(%rax)
    movss   %xmm2, 8(%rax)
    ret
    .cfi_endproc

mesh.asm extract of (pimpl) Mesh::position

Which is essentially the same but with an extra dereferencing.

This is definitely an improvement of performances!

So, should you use it?

Basically, if your code is not performance-critical, don't even consider the pattern. Keep it pimpl if you want to add some complexity to the user, or just put everything in your class if you don't care.

Pros:

  • Same advantages than pimpl: hidden complexity, faster building times.
  • Memory-friendly!
  • Better compiler optimisations due to no dereferencing.
  • Pretty easily hacked into an existing pimpled class.

Drawbacks:

  • You will need a way to automatically compute the afts' sizes, which can add complexity to your build system.
  • When using multiple backends with with runtime-switching, aft is less interesting than pimpl as you could use way more memory than needed by pre-allocating to much just in case. However, if you have only compile-time switching, that's not an issue: just use the size of the aft used in the pre-compiled library binary.
  • Hard to maintain and pretty risky memory-wise.

Surely, then, its depends.