2

Ok so let's start with the most obvious solution:

memcpy(Ptr, (const char[]){'a', 'b'}, 2);

There's quite an overhead of calling a library function. Compilers sometimes don't optimize it, well I wouldn't rely on compiler optimizations but even though GCC is smart, if I'm porting a program to more exotic platforms with trashy compilers I don't want to rely on it.

So now there's a more direct approach:

Ptr[0] = 'a';
Ptr[1] = 'b';

It doesn't involve any overhead of library functions, but is making two different assignments. Third we have a type pun:

*(uint16_t*)Ptr = *(uint16_t*)(unsigned char[]){'a', 'b'};

Which one should I use if in a bottleneck? What's the fastest way to copy only two bytes in C?

Regards,
Hank Sauri

  • 2
    `*(uint16_t*)Ptr` ? No, that is endianess dependent and depend on `Ptr` beeing aligned to `uint16_t`. – KamilCuk Jul 30 '20 at 08:25
  • @KamilCuk But assume there are two bytes allocated already. Why would dereferencing just the first two bytes cause a segfault if Ptr has 3 bytes allocated? –  Jul 30 '20 at 08:26
  • 3
    Measure the speed. Does it actually make any difference? The third choice has zero chance of passing a code review. I wouldn't even _try_ to understand it. Waste of my time. – gnasher729 Jul 30 '20 at 08:27
  • Two adjacent bytes to be copied to two other adjacent locations? – Carlos Jul 30 '20 at 08:28
  • 3
    You're spending too much time optimizing code for bad compilers. – Antti Haapala Jul 30 '20 at 08:29
  • 3
    `Why would dereferencing just the first two bytes cause a segfault if Ptr has 3 bytes allocated?` Alignment is not about the count of bytes allocated. – KamilCuk Jul 30 '20 at 08:29
  • _"There's quite an overhead of calling a library function"_ ... is there? Does it really call a library function? Measure it with your actual compiler's release build. – Useless Jul 30 '20 at 08:29
  • https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment – Antti Haapala Jul 30 '20 at 08:29
  • This just smells premature optimization. You don't even have a model of "fastest" and seem even to be assuming (1) that it matters in any current implementation (2) that there could be optimal code regardless of the platform, alignment, whatever. – Jens Gustedt Jul 30 '20 at 08:32
  • @JensGustedt I do premature optimizations. Seriously it's been bugging me for a while what's the best way to copy two adjacent bytes in C? –  Jul 30 '20 at 08:34
  • Surprisingly, gcc doesn't want to optimize `Ptr[0] = 'a'; Ptr[1] = 'b'` to a single assembly. [godbolt](https://godbolt.org/z/qP9vqa0) – KamilCuk Jul 30 '20 at 08:35
  • 2
    There isn't one. The best way to copy two adjacent bytes in C will depend on the compiler, the platform, the cache structure, and what the rest of your program and system are doing. Further, it probably doesn't matter. – Useless Jul 30 '20 at 08:37
  • In the rare situation where it does matter, you already know your platform & compiler, have investigated the target instruction set architecture and generated assembly code, and you're trying to coax the compiler into generating the output you want. – Useless Jul 30 '20 at 08:38
  • @Useless Ok so which one should I settle with in general? –  Jul 30 '20 at 08:41
  • Whichever is most readable and least effort to write. Don't worry about it until after some whole-program profiling proves something is an actual, real problem. Otherwise you're just inventing things to worry about that will probably never matter. – Useless Jul 30 '20 at 08:42
  • On optimizing compilers such as gcc or clang, `memcpy` shouldn't be thought of as primarily a library function. It should be thought of as a (builtin) operator (which only calls a library function named `memcpy` if there's no better way). `memcpy` needs to be compiler-recognized at least everywhere where strict-aliasing-based optimizations are implemented (and they definitely are on gcc and clang). – PSkocik Jul 30 '20 at 08:45
  • 1
    @Useless: UB should be avoided at all times, especially UB involving strict-aliasing with gcc. It might work today, and then you'll add one line of code somewhere and gcc will decide to reorder writes to the same address because one of the pointers involved is aliasing another one. I've seen this reordering happen even across an function call (a function marked as `inline`, but nevertheless) with aliased pointers. Just spare yourself the trouble; use `memcpy`, if needed check the compiler output to make sure it's not utterly ridiculous, but that's about it. – Groo Jul 30 '20 at 08:58
  • @Groo - that was an argument in favour of just using memcpy and not over-thinking it. You're right, of course, but I certainly wasn't advising UB in the first or any instance. – Useless Aug 03 '20 at 17:11
  • There's no "fastest" way to do it. You must measure it. If you don't measure it, then you *demonstrably* don't care about performance. I'm these days of the mind that *any* micro-optimization performance question on SO that's not accompanied by measurements is a waste of asker's time, and thus the answerer's time, and should be closed sooner rather than later to save the asker some trouble until they realize that un-measured performance is just a dream they had at night. It's a figment of imagination, and discussing it here makes as much sense as talking about fairies. – Kuba hasn't forgotten Monica Aug 18 '20 at 16:07
  • **If you don't care enough to measure it, you don't care, period.** – Kuba hasn't forgotten Monica Aug 18 '20 at 16:08

3 Answers3

5

Only two of the approaches you suggested are correct:

memcpy(Ptr, (const char[]){'a', 'b'}, 2);

and

Ptr[0] = 'a';
Ptr[1] = 'b';

On X86 GCC 10.2, both compile to identical code:

mov     eax, 25185
mov     WORD PTR [something], ax

This is possible because of the as-if rule.

Since a good compiler could figure out that these are identical, use the one that is easier to write in your cse. If you're setting one or two bytes, use the latter, if several use the former or use a string instead of a compound literal array.


The third one you suggested

*(uint16_t*)Ptr = *(uint16_t*)(unsigned char[]){'a', 'b'};

also compiles to the same code when using x86-64 GCC 10.2, i.e. it would behave identically in this case.

But in addition it has 2-4 points of undefined behaviour, because it has twice strict aliasing violation and twice, coupled with possible unaligned memory access at both source and destination. Undefined behaviour does not mean that it must not work like you intended, but neither does it mean that it has to work as you intended. The behaviour is undefined. And it can fail to work on any processor, including x86. Why would you care about the performance on a bad compiler so much that you would write code that would fail to work on a good compiler?!

Antti Haapala
  • 117,318
  • 21
  • 243
  • 279
  • Ok that rules out 3, but of the two remaining options... ? –  Jul 30 '20 at 08:36
  • Well, `memcpy(Ptr, "ab", 2)` works too if you don't mind the possibility of one extra null byte in your binary's static section. (According to the Compiler Explorer, at least GCC optimizes this memcpy to a mov anyway.) – AKX Jul 30 '20 at 08:36
  • If one can ensure that the pointers will be aligned and is targeting an embedded platform like the popular ARM Cortex-M0, the "bad bad" code will be much more efficient than the others, while gcc 9.2.1 will process the `memcpy` by using an actual call to `memcpy`. – supercat Aug 18 '20 at 15:53
4

When in doubt, use the Compiler Explorer.

#include <string.h>
#include <stdint.h>

int c1(char *Ptr) {
    memcpy(Ptr, (const char[]){'a', 'b'}, 2);
}

int c2(char *Ptr) {
    Ptr[0] = 'a';
    Ptr[1] = 'b';
}

int c3(char *Ptr) {
    // Bad bad not good.
    *(uint16_t*)Ptr = *(uint16_t*)(unsigned char[]){'a', 'b'};
}

compiles down to (GCC)

c1:
        mov     eax, 25185
        mov     WORD PTR [rdi], ax
        ret
c2:
        mov     eax, 25185
        mov     WORD PTR [rdi], ax
        ret
c3:
        mov     eax, 25185
        mov     WORD PTR [rdi], ax
        ret

or (Clang)

c1:                                     # @c1
        mov     word ptr [rdi], 25185
        ret
c2:                                     # @c2
        mov     word ptr [rdi], 25185
        ret
c3:                                     # @c3
        mov     word ptr [rdi], 25185
        ret
AKX
  • 93,995
  • 11
  • 81
  • 98
  • Thanks for your time and detailed answer –  Jul 30 '20 at 08:40
  • Howdy :P seems that I have a competitor in C now :D – Antti Haapala Jul 30 '20 at 08:40
  • If one knows the bytes will be aligned, and one is targeting an embedded target like an ARM Cortex-M0, the "bad bad" code will yield better results than the other forms, since the other forms will force the compiler to generate code that can accommodate arbitrary alignment. – supercat Aug 18 '20 at 15:48
-1

in C this approach is, no doubt, the fastest:

Ptr[0] = 'a'; Ptr[1] = 'b';

This is why:

All Intel and ARM CPU's are able to store some constant data (also called immediate data) within selected assembly instructions. These instructions are memory-to-cpu and cpu-to-memory data transfer like: MOV

That means that when those instructions are fetched from the PROGRAM memory to the CPU the immediate data will arrive to the CPU along with the instruction.

'a' and 'b' are constant and therefore might enter the CPU along with the MOV instruction.

Once the immediate data is in the CPU, the CPU itself has only to make one memory access to the DATA memory for writing 'a' to Ptr[0].

Ciao, Enrico Migliore

Enrico Migliore
  • 194
  • 1
  • 6
  • *All Intel and ARM CPU,,,* The world isn't limited to Intel and ARM CPUs. – Andrew Henle Jul 30 '20 at 09:34
  • "No doubt" – well, all three approaches compile down to the exact same instructions as evident from the other answers. – AKX Jul 30 '20 at 09:55
  • All modern CPU's have the ability to put immediate data in transfer assembly instructions. – Enrico Migliore Jul 30 '20 at 14:09
  • @EnricoMigliore: On many ARM families, immediate operands have a very limited range of values; given `extern x; x=1234567;`, a compiler would need to use a PC-relative load for both the constant 1234567 and another for the address of x, before it could store the value to x. – supercat Aug 25 '20 at 06:37
  • In C, you often find the following statements: if (a == 0) if (a == 1) if (a == 10) – Enrico Migliore Sep 10 '20 at 10:18