I was sure that the pimpl design pattern was an interesting thing. It was made so that one can hide compile-time dependencies to the end-user, allowing faster compile times.
However, as you know, pimpl starts with a p. And that's the big issue: pointers. So having a nice API implies memory fragmentation and cache misses? Surely, no!
Let me present you the Aft pattern, a pimpl with no p!
Important notice
Game Engine: performance results
Working on lava, I had implemented bucket allocators for the meshes and the materials so that whenever the user wants a new mesh, it will be constructed in a preallocated memory, keeping the update loops of the meshes cache-coherent.
However, doing so did not led to the big improvement I expected on
big scenes. Mesh
had a pointer to a Mesh::Impl
.
Allocating all the memory on the custom allocators for the impls was not enough because some part of the higher-level code was iterating on all the Mesh
,
making an avoidable deferencing.
Abstraction has a performance cost and wanted to shrink that to the minimum. Hence, I came up with the Aft pattern, which is basically a pointer to the implementation (pimpl) with no pointer.
Before | After |
---|---|
5.2ms | 4.9ms |
Time spent updating all states and recording all command buffers,
before and after switching to Aft pattern for
Mesh, Material, Texture, Light, and Scene.
As you can see, I grabbed some CPU time doing these changes! Let's see how it works…
From pimpl to aft
The word aft refers to rear of a boat, the opposite being fore. I just used this term to have a small variable name with the same meaning that backend or impl.
Here's how traditional pimpl goes:
Structure of pimpl: the implementation details are hidden behind a pointer,
meaning a lot a dereferencing.
Thanks to the pointer, MeshImpl
can be forward declared.
But this implies a pointer derefencing each time we try to call
anything backend-dependent from Mesh
.
The pointer being the annoying part, one can simply try:
Structure of composition: as the compiler needs to know the size of MeshDetails
,
the class has to be visible to the end-user.
The good part with the composition method is that details are encapsulated
behind a object that can be private
. The data is also fully located
within the Mesh
, ensuring good cache coherency between the implementation
and its public API.
The big problem, however, is that MeshDetails
needs to be known
whenever the end-user wants to use a Mesh
. Which means more compilation time
due to all the headers necessary to know MeshDetails
and, with that,
dependencies to the backend developement files. (Forcing the user to have
vulkan's headers or so just to rotate a mesh is a bit sad.)
The aft pattern attemps to fix that problem, though:
Structure of aft: the implementation details are hidden behind an internal array of bytes,
so the backend can access all the info it needs by a simple cast.
By saying that we want the data within Mesh
but not the complexity
of what's exactly inside, we come up with this pattern. One just need
the size of MeshAft
to be pre-computed. Afterwards, to use the
data as a MeshAft
, a simple reinterpret_cast
just do it.
Smart eyes might notice that sizeof(MeshAft)
needs to know MeshAft
at compile time,
not fixing anything, then. And that's right: you'll need to pre-compute that value.
Either use a script that update that value whenever needed to a generated file defining SIZE_OF_MESH_AFT
or do use sizeof(MeshAft)
while developing and update that value on release.
Let's have a look to some C++ code before testing if compiler handle that well.
Mock-up code
// Pre-computed sizeof(MeshAft)
#define SIZE_OF_MESH_AFT 12
// Forward declaration.
class MeshAft;
// The Fore of the pattern: the user API.
class Mesh {
public:
Mesh();
~Mesh();
void position(float x, float y, float z);
private:
MeshAft& aft() { return reinterpret_cast<MeshAft&>(m_aft); }
uint8_t m_aft[SIZE_OF_MESH_AFT];
};
include/mesh.hpp
#include "./mesh.hpp"
// Placement new only exists within this header
#include <new>
// Could be #ifdef with different technologies.
#include "./mesh-aft.hpp"
Mesh::Mesh() {
// Construct the Aft in-place within us.
new (&aft()) MeshAft();
}
Mesh::~Mesh() {
// Call the Aft destructor because it won't be called
// by the compiler as m_aft is just bytes.
aft().~MeshAft();
}
void Mesh::position(float x, float y, float z) {
aft().position(x, y, z);
}
source/mesh.cpp
// All our implementation details.
class MeshAft {
public:
void position(float x, float y, float z) {
m_position[0] = x;
m_position[1] = y;
m_position[2] = z;
}
private:
/* Whatever the end-user does not need to know
(our storage structures and such). */
float m_position[3];
};
source/mesh-aft.hpp
Examining generated assembly
Compiling mesh.cpp
with g++ -O3 -S -c mesh.cpp -o mesh.asm
,
and extracting the position
method code, we see:
_ZN4Mesh8positionEfff:
.LFB79:
.cfi_startproc
movss %xmm0, (%rdi)
movss %xmm1, 4(%rdi)
movss %xmm2, 8(%rdi)
ret
.cfi_endproc
mesh.asm extract of (aft) Mesh::position
This code simply says: "Hey take these arguments and put them offset 0, 4 and 8 of myself, thanks!".
Because the aft is basically Mesh
itself. Note that there is no complex call of function
because the code of MeshAft::position
has been inlined here.
Doing the exact same test with our class storing a pimpl instead, we get:
_ZN4Mesh8positionEfff:
.LFB78:
.cfi_startproc
movq (%rdi), %rax
movss %xmm0, (%rax)
movss %xmm1, 4(%rax)
movss %xmm2, 8(%rax)
ret
.cfi_endproc
mesh.asm extract of (pimpl) Mesh::position
Which is essentially the same but with an extra dereferencing.
This is definitely an improvement of performances!
So, should you use it?
Basically, if your code is not performance-critical, don't even consider the pattern. Keep it pimpl if you want to add some complexity to the user, or just put everything in your class if you don't care.
Pros:
- Same advantages than pimpl: hidden complexity, faster building times.
- Memory-friendly!
- Better compiler optimisations due to no dereferencing.
- Pretty easily hacked into an existing pimpled class.
Drawbacks:
- You will need a way to automatically compute the afts' sizes, which can add complexity to your build system.
- When using multiple backends with with runtime-switching, aft is less interesting than pimpl as you could use way more memory than needed by pre-allocating to much just in case. However, if you have only compile-time switching, that's not an issue: just use the size of the aft used in the pre-compiled library binary.
- Hard to maintain and pretty risky memory-wise.
Surely, then, its depends.