74

I am curious why the following piece of code:

#include <string>
int main()
{
    std::string a = "ABCDEFGHIJKLMNO";
}

when compiled with -O3 yields the following code:

main:                                   # @main
    xor     eax, eax
    ret

(I perfectly understand that there is no need for the unused a so the compiler can entirely omit it from the generated code)

However the following program:

#include <string>
int main()
{
    std::string a = "ABCDEFGHIJKLMNOP"; // <-- !!! One Extra P 
}

yields:

main:                                   # @main
        push    rbx
        sub     rsp, 48
        lea     rbx, [rsp + 32]
        mov     qword ptr [rsp + 16], rbx
        mov     qword ptr [rsp + 8], 16
        lea     rdi, [rsp + 16]
        lea     rsi, [rsp + 8]
        xor     edx, edx
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
        mov     qword ptr [rsp + 16], rax
        mov     rcx, qword ptr [rsp + 8]
        mov     qword ptr [rsp + 32], rcx
        movups  xmm0, xmmword ptr [rip + .L.str]
        movups  xmmword ptr [rax], xmm0
        mov     qword ptr [rsp + 24], rcx
        mov     rax, qword ptr [rsp + 16]
        mov     byte ptr [rax + rcx], 0
        mov     rdi, qword ptr [rsp + 16]
        cmp     rdi, rbx
        je      .LBB0_3
        call    operator delete(void*)
.LBB0_3:
        xor     eax, eax
        add     rsp, 48
        pop     rbx
        ret
        mov     rdi, rax
        call    _Unwind_Resume
.L.str:
        .asciz  "ABCDEFGHIJKLMNOP"

when compiled with the same -O3. I don't understand why it does not recognize that the a is still unused, regardless that the string is one byte longer.

This question is relevant to gcc 9.1 and clang 8.0, (online: https://gcc.godbolt.org/z/p1Z8Ns) because other compilers in my observation either entirely drop the unused variable (ellcc) or generate code for it regardless the length of the string.

einpoklum
  • 86,754
  • 39
  • 223
  • 453
Ferenc Deak
  • 30,889
  • 14
  • 82
  • 151
  • 17
    may be linked to some [short string optimization practices](https://stackoverflow.com/questions/10315041/meaning-of-acronym-sso-in-the-context-of-stdstring/10319672#10319672) ? – UmNyobe Jun 03 '19 at 10:24
  • 4
    Could it be because of the small string optimization? try to declare `a` as volatile and you see that the two strings are treated differently. the longest seems to be allocated on the heap. https://gcc.godbolt.org/z/WUuJIB – Davide Spataro Jun 03 '19 at 10:24
  • 6
    See [this thread](https://stackoverflow.com/questions/31873616/is-the-compiler-allowed-to-optimize-out-heap-memory-allocations) for discussion of whether the compiler is allowed to optimize out dynamic allocations – M.M Jun 03 '19 at 10:31
  • 1
    I you use `string_view` instead, it'll still optimize a longer string away: https://godbolt.org/z/AAViry – Ted Lyngmo Jun 03 '19 at 10:49
  • 1
    Try to append `-stdlib=libc++` for compilation with Clang ;-) – Daniel Langr Jun 03 '19 at 10:56
  • @DanielLangr: See my answer for an explanation. – einpoklum Mar 23 '20 at 23:07

3 Answers3

68

This is due to the small string optimization. When the string data is less than or equal 16 characters, including the null terminator, it is stored in a buffer local to the std::string object itself. Otherwise, it allocates memory on the heap and stores the data over there.

The first string "ABCDEFGHIJKLMNO" plus the null terminator is exactly of size 16. Adding "P" makes it exceed the buffer, hence new is being called internally, inevitably leading to a system call. The compiler can optimize something away if it's possible to ensure that there are no side effects. A system call probably makes it impossible to do this - by constrast, changing a buffer local to the object under construction allows for such a side effect analysis.

Tracing the local buffer in libstdc++, version 9.1, reveals these parts of bits/basic_string.h:

template<typename _CharT, typename _Traits, typename _Alloc>
class basic_string
{
   // ...

  enum { _S_local_capacity = 15 / sizeof(_CharT) };

  union
    {
      _CharT           _M_local_buf[_S_local_capacity + 1];
      size_type        _M_allocated_capacity;
    };
   // ...
 };

which lets you spot the local buffer size _S_local_capacity and the local buffer itself (_M_local_buf). When the constructor triggers basic_string::_M_construct being called, you have in bits/basic_string.tcc:

void _M_construct(_InIterator __beg, _InIterator __end, ...)
{
  size_type __len = 0;
  size_type __capacity = size_type(_S_local_capacity);

  while (__beg != __end && __len < __capacity)
  {
    _M_data()[__len++] = *__beg;
    ++__beg;
  }

where the local buffer is filled with its content. Right after this part, we get to the branch where the local capacity is exhausted - new storage is allocated (through the allocate in M_create), the local buffer is copied into the new storage and filled with the rest of the initializing argument:

  while (__beg != __end)
  {
    if (__len == __capacity)
      {
        // Allocate more space.
        __capacity = __len + 1;
        pointer __another = _M_create(__capacity, __len);
        this->_S_copy(__another, _M_data(), __len);
        _M_dispose();
        _M_data(__another);
        _M_capacity(__capacity);
      }
    _M_data()[__len++] = *__beg;
    ++__beg;
  }

As a side note, small string optimization is quite a topic on its own. To get a feeling for how tweaking individual bits can make a difference at large scale, I'd recommend this talk. It also mentions how the std::string implementation that ships with gcc (libstdc++) works and changed during the past to match newer versions of the standard.

Toby Speight
  • 23,550
  • 47
  • 57
  • 84
lubgr
  • 33,994
  • 3
  • 54
  • 101
  • 4
    There are no syscalls in the assembly output. – Maxim Egorushkin Jun 03 '19 at 10:49
  • 8
    Note that the limit of 16 characters is implementation-defined. It holds for GCC/libstdc++ and MSVC and x86_64 architecture. Libc++ (used typically with Clang) employs another approach and the limit is there higher (23 chars). (Godbolt's Clang seemingly uses libstdc++ according to the generated assembly.) – Daniel Langr Jun 03 '19 at 10:49
  • 1
    I think he means this: "call std::__cxx11::basic_string, std::allocator >::_M_create(unsigned long&, unsigned long)" it's not inline, it's not a copy constructor, and so it can't be optimised away as it might have observable side effects – Tom Tanner Jun 03 '19 at 11:00
  • 11
    Actually, Clang can optimize away `new` without worrying about the underlying implementation. It is explicitly allowed in C++14: see the [Allocation section](https://en.cppreference.com/w/cpp/language/new) "`delete[] new int[10];` can be optimized out". – Matthieu M. Jun 03 '19 at 18:33
  • 6
    ...and my respect towards people who write compilers increases even more. – kedarps Jun 03 '19 at 18:40
  • 4
    @DanielLangr: Godbolt has libc++ installed. To have clang use it, use `-stdlib=libc++` . And yes, this does allow clang8.0 to optimize away the longer string: https://gcc.godbolt.org/z/gVm_6R. Godbolt's clang install is like a normal GNU/Linux install where it uses libstdc++ by default. – Peter Cordes Jun 04 '19 at 04:29
  • 1
    @MatthieuM. even without that explicit statement, I’d be surprised if the specification said anywhere that `new` has to lead to a potentially side effect bearing system call. – Holger Jun 04 '19 at 17:16
  • @Holger That's... intuition. Departing from such is the abnormal rather than the norm. – Passer By Jun 05 '19 at 11:14
19

I was surprised the compiler saw through a std::string constructor/destructor pair until I saw your second example. It didn't. What you're seeing here is small string optimization and corresponding optimizations from the compiler around that.

Small string optimizations are when the std::string object itself is big enough to hold the contents of the string, a size and possibly a discriminating bit used to indicate whether the string is operating in small or big string mode. In such a case, no dynamic allocations occur and the string is stored in the std::string object itself.

Compilers are really bad at eliding unneeded allocations and deallocations, they are treated almost as if having side effects and are thus impossible to elide. When you go over the small string optimization threshold, dynamic allocations occur and the result is what you see.

As an example

void foo() {
    delete new int;
}

is the simplest, dumbest allocation/deallocation pair possible, yet gcc emits this assembly even under O3

sub     rsp, 8
mov     edi, 4
call    operator new(unsigned long)
mov     esi, 4
add     rsp, 8
mov     rdi, rax
jmp     operator delete(void*, unsigned long)
Passer By
  • 16,942
  • 5
  • 38
  • 81
  • 3
    What compiler version was used? According to this: https://en.cppreference.com/w/cpp/language/new#Allocation , since C++14 it's allowed to optimize out such allocations. – Balázs Kovacsics Jun 03 '19 at 10:35
  • @BalázsKovacsics gcc 9.1, added link to godbolt. – Passer By Jun 03 '19 at 10:38
  • 5
    Clang 3.8 correctly optimizes it out for me (unless it's invoked with the operator new() function call), seems like it's a gcc issue. – Balázs Kovacsics Jun 03 '19 at 10:46
  • 7
    Relevant discussion: [Is the compiler allowed to optimize out heap memory allocations?](https://stackoverflow.com/q/31873616/580083). – Daniel Langr Jun 03 '19 at 10:59
  • 3
    *treated almost as if having side effects* Part of this problem might be that C++'s `new` is "replaceable" by the user. So it really *might* have side effects, like logging allocations. This also makes it impossible to optimize `std::vector` resize into `realloc` instead of new/copy/delete unless the compiler has link-time knowledge that `new` hasn't been replaced, which is really really dumb. The C++14 guarantee from the standard that `delete new ...` can be optimized out is helpful, but not all compilers look for it yet. – Peter Cordes Jun 04 '19 at 04:33
0

While the accepted answer is valid, since C++14 it's actually the case that new and delete calls can be optimized away. See this arcane wording on cppreference:

New-expressions are allowed to elide ... allocations made through replaceable allocation functions. In case of elision, the storage may be provided by the compiler without making the call to an allocation function (this also permits optimizing out unused new-expression).

...

Note that this optimization is only permitted when new-expressions are used, not any other methods to call a replaceable allocation function: delete[] new int[10]; can be optimized out, but operator delete(operator new(10)); cannot.

This actually allows compilers to completely drop your local std::string even if it's very long. In fact - clang++ with libc++ already does this (GodBolt), since libc++ uses built-ins __new and __delete in its implementation of std::string - that's "storage provided by the compiler". Thus, we get:

main():
        xor eax, eax
        ret

with basically any-length unused string.

GCC doesn't do but I've recently opened bug reports about this; see this SO answer for links.

einpoklum
  • 86,754
  • 39
  • 223
  • 453