This question is not about the definition of unaligned data accesses, but why memcpy
silences the UBsan findings whereas type casting does not, despite generating the same assembly code.
I have some example code to parse a protocol that sends a byte array segmented into groups of six bytes.
void f(u8 *ba) {
// I know this array's length is a multiple of 6
u8 *p = ba;
u32 a = *(u32 *)p;
printf("a = %d\n", a);
p += 4;
u16 b = *(u16 *)p;
printf("b = %d\n", b);
p += 2;
a = *(u32 *)p;
printf("a = %d\n", a);
p += 4;
b = *(u16 *)p;
printf("b = %d\n", b);
}
After incrementing my pointer by 6 and doing another 32-bit read, the UBSan reports an error about a misaligned load. I suppress this error using memcpy
instead of type-punning, but I don't have a good understanding why. To be clear, here is the same routine without UBSan errors,
void f(u8 *ba) {
// I know this array's length is a multiple of 6 (
u8 *p = ba;
u32 a;
memcpy(&a, p, 4);
printf("a = %d\n", a);
p += 4;
memcpy(&b, p, 2);
printf("b = %d\n", b);
p += 2;
memcpy(&a, p, 4);
printf("a = %d\n", a);
p += 4;
memcpy(&b, p, 2);
printf("b = %d\n", b);
}
Both routines compile to identical assembly code (using movl
for the 32-bit read and movzwl
for the 16-bit read), so why is one undefined behaviour when the other is not? Does memcpy
have some special properties that guarantee something?
I don't want to use memcpy
here because I can't rely on compilers doing a good enough job optimising it.