Friday, 13 October 2017

Memory alignment in C++ and C# and probably in every other language that can integrate with C++

I've learned something new today. It all starts with an innocuous question: Given the following struct, tell me what is its size:
    public struct MyStruct
{
public int i1;
public char c1;
public long l1;
public char c2;
public short s1;
public char c3;
}
Let's assume that this is in 32bit C++ or C#.

The first answer is 4+1+8+1+2+1 = 17. Nope! It's 24.

Well, it is called memory alignment and it has to do with the way CPUs work. They have memory registers of fixed size, various caches with different sizes and speeds, etc. Basically, when you ask for a 4 byte int, it needs to be "aligned" so that you get 4 bytes from the correct position into a single register. Otherwise the CPU needs to take two registers (let's say 1 byte in one and 3 bytes in another) then mask and shift both and add them into another register. That is unbelievably expensive at that level.

So, why 24? i1 is an int, it needs to be aligned on positions that are multiple of 4 bytes. 0 qualifies, so it takes 4 bytes. Then there is a char. Chars are one byte, can be put anywhere, so the size becomes 5 bytes. However, a long is 8 bytes, so it needs to be on a position that is a multiple of 8. That is why we add 3 bytes as padding, then we add the long in. Now the size is 16. One more char → 17. Shorts are 2 bytes, so we add one more padding byte to get to 18, then the short is added. The size is 20. And in the end you get the last char in, getting to 21. But now, the struct needs to be aligned with itself, meaning with the largest primitive used inside it, in our case the long with 8 bytes. That is why we add 3 more bytes so that the struct has a size that is a multiple of 8.

Note that a struct containing a struct will align it to its largest primitive element, not the actual size of the child struct. It's a recursive process.

Can we do something about it? What if I want to spend speed on memory or disk space? We can use directives such as StructLayout. It receives a LayoutKind - which defaults to Sequential, but can also be Auto or Explicit - and a numeric Pack parameter. Auto rearranges the order of the members of the class, so it takes the least amount of space. However, this has some side effects, like getting errors when you want to use Marshal.SizeOf. With Explicit, each field needs to be adorned with a FieldOffset attribute to determine the exact position in memory; that also means you can use several fields on the same position, like in:
    [StructLayout(LayoutKind.Explicit)]
public struct MyStruct
{
[FieldOffset(0)]
public int i1;
[FieldOffset(4)]
public int i2;
[FieldOffset(0)]
public long l1;
}
The Pack parameter tells the system on how to align the fields. 0 is the default, but 1 will make the size of the first struct above to actually be 17.
    [StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct MyStruct
{
public int i1;
public char c1;
public long l1;
public char c2;
public short s1;
public char c3;
}
Other values can be 2,4,8,16,32,64 or 128. You can test on how the performance is affected by this, as an exercise.

More information here: Advanced c# programming 6: Everything about memory allocation in .NET

Update: I've created a piece of code to actually test for this:
unsafe static void Main(string[] args)
{
var st = new MyStruct();
Console.WriteLine($"sizeof:{sizeof(MyStruct)} Marshal.sizeof:{Marshal.SizeOf(st)} custom sizeof:{MySizeof(st)}");
Console.ReadKey();
}
 
private static long MySizeof(MyStruct st)
{
long before = GC.GetTotalMemory(true);
MyStruct[] array = new MyStruct[100000];
long after = GC.GetTotalMemory(true);
var size = (after - before) / array.Length;
return size;
}

Considering the original MyStruct, the size reported by all three ways of computing size is 24. I had to test the idea that the maximum byte padding is 4, so I used this structure:
public struct MyStruct
{
public long l;
public byte b;
}
Since long is 8 bytes and byte is 1, I expected the size to be 16 and it was, not 12. However, I decided to also try with a decimal instead of the long. Decimal values have 16 bytes, so if my interpretation was correct, 17 bytes should be aligned with the size of the biggest struct primitive field: a multiple of 16, so 32. The result was weirdly inconsistent: sizeof:20 Marshal.sizeof:24 custom sizeof:20, which suggests an alignment to 4 or 8 bytes, not 16. So I started playing with the StructLayoutAttribute:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct MyStruct
{
public decimal d;
public byte b;
}

For Pack = 1, I got the consistent 17 bytes. For Pack=4, I got consistent values of 20. For Pack=8 or higher, I got the weird 20-24-20 result, which suggests packing works differently for decimals than for other values. I've replaced the decimal with a struct containing two long values and the consistent result was back to 24, but then again, that's expected. Funny thing is that Guid is also a 16 byte value, although it is itself a struct, and the resulting size was 20. Guid is not a value type, though.

The only conclusion I can draw is that what I wrote in this post is true. Also, StructLayout Pack does not work as I had expected, instead it provides a minimum packing size, not a maximum one. If the biggest element in the struct is 8 bytes, then the minimum between the Pack value and 8 will be used for alignment. The alignment of the type is the size of its largest element (1, 2, 4, 8, etc., bytes) or the specified packing size, whichever is smaller.

All this if you are not using decimals... then all bets are off! From my discussions with Filip B. Vondrášek in the comments of this post, I've reached the conclusion that decimals are internally structs that are aligned to their largest element, an int, so to 4 bytes. However, it seems Marshal.sizeof misreports the size of structs containing decimals, for some reason.

In fact, all "simple" types are structs internally, as described by the C# language specification, but the Decimal struct also implements IDeserializationEventListener, but I don't see how this would influence things. Certainly the compilers have optimizations for working with primitive types. This is as deep as I want to go with this, anyway.

0 comments:

Post a Comment