3

In C, there is a nice construct to create a c-string with more allocated space:

char str[6] = "Hi";  // [H,i,0,0,0,0]

I thought I could do the same using (4) version of string constructor, but the reference says

The behavior is undefined if s does not point at an array of at least count elements of CharT.

So it is not safe to use

std::string("Hi", 6);

Is there any way to create such std::string without extra copies and reallocations?

Jan Turoň
  • 26,696
  • 21
  • 102
  • 153
  • Just doing `std::string s = "Hi";` will almost certainly allocate more that 2 characters so that you do not pay re-allocation penalty if you only increase the string's size by as little as `6` characters. – Galik Oct 22 '17 at 11:36
  • @Galik `std::string s = "Hi"; scanf("%4s", &s[2]);` - can you be 100% sure that this won't cause segmentation fault? – Jan Turoň Oct 22 '17 at 17:25
  • Well you were asking about reallocations. Whatever you do you are never going to be able to write beyond the `size()` of the string without invoking undefined behavior. So even if you could do what you want to do you still can't `scanf("%4s", &s[2]);`. I was just pointing out that as far as reallocations go, the string probably already makes fairly sensible decisions. – Galik Oct 22 '17 at 17:35

2 Answers2

4

Theory:

Legacy c-strings

Consider the following snippet:

int x[10];

void method() {
     int y[10];
}

The first declaration, int x[10], uses static storage duration, defined by cppreference as: "The storage for the object is allocated when the program begins and deallocated when the program ends. Only one instance of the object exists. All objects declared at namespace scope (including global namespace) have this storage duration, plus those declared with static or extern."

In this case, the allocation happens when the program begins and freed when it ends. From cppreference.com:

static storage duration. The storage for the object is allocated when the program begins and deallocated when the program ends.

Informally, it is implementation-defined. But, since these strings never change they are stored in read-only memory segments (.BSS/.DATA) of the executable and are only referenced during run-time.

The second one, int y[10], uses automatic storage duration, defined by cppreference as: "The object is allocated at the beginning of the enclosing code block and deallocated at the end. All local objects have this storage duration, except those declared static, extern or thread_local."

In this case, there is a very simple allocation, a simple as moving the stack pointer in most cases.

std::string

A std::string on the other hand is a run-time creature, and it has to allocate some run-time memory:

  • For smaller strings, std::string has an inner buffer with a constant size and is capable of storing small strings (think of it as a char buffer[N] member)
  • For larger strings, it performs dynamic allocations.

Practice

You could use reserve(). This method makes sure that the underlying buffer can hold at least N charT's.

Option 1: First reserve, then append

std::string str;
str.reserve(6);
str.append("Hi");

Option 2: First construct, then reserve

std::string str("Hi");
str.reserve(6);
Daniel Trugman
  • 6,100
  • 14
  • 35
  • I know, but then 1) str is created, 2) str is reallocated to grow to size 6, 3) "Hi" is copied byte by byte to str. I'd like to avoid these steps and just create [H,i,0,0,0,0] string. At the second version reallocation still occurs. – Jan Turoň Oct 22 '17 at 10:27
  • @JanTuroň - You seem to have fundamental misunderstanding of how `std::string` works. There's no way to avoid all of these step. Even if there was a c'tor that could do it "in one step". It would have to allocate a buffer, and copy the literal you give it. – StoryTeller - Unslander Monica Oct 22 '17 at 10:28
  • @StoryTeller - So the static allocation in the C version does these steps, too? – Jan Turoň Oct 22 '17 at 10:32
  • @JanTuroň - A block scope? Yes. Although the allocation performed is much simpler. – StoryTeller - Unslander Monica Oct 22 '17 at 10:34
  • @JanTuroň, it depends where you declare the `char str[6]`. It might do a simple `stack` allocation (for a local) or host it in the .BSS/.DATA segments (for a static/extern). But if you want to use a `std::string` there will be some kind of allocation (although strings have a default minimal block inside them that might be used to spare allocations for short strings). – Daniel Trugman Oct 22 '17 at 10:38
  • @StoryTeller - I mean [Array initialization](http://en.cppreference.com/w/c/language/array_initialization) from string literal. I would also accept an answer explaining (and referencing) the steps of creating such strings compared to the C version. – Jan Turoň Oct 22 '17 at 10:40
  • I believe, there is no need to stuff the `\0\0\0` to the string initializer as mentioned [here](http://en.cppreference.com/w/c/language/array_initialization): `All array elements that are not initialized explicitly are initialized implicitly the same way as objects that have static storage duration.` – Jan Turoň Oct 22 '17 at 10:49
  • @JanTuroň, you are right, I rolled back my answer for that exact reason. For a second there I though I understood your request, but I really don't. Could you explain: _"I would also accept an answer explaining (and referencing) the steps of creating such strings compared to the C version"_. Do you mean you want an explanation to what happens for `str = "Hi"` when str is a `std::string`? – Daniel Trugman Oct 22 '17 at 10:51
  • As StoryTeller claimed in the comment above, the C array initialization must do the copy, too. I almost believe it, but I'd be happy to support the claim with some authoritative reference. So yes, I am not sure, what exactly `char str[6] = "Hi"` does and if the steps behind are different to `str = "Hi"` – Jan Turoň Oct 22 '17 at 10:54
  • So in the block scope (the y) the copy by value of "Hi" occurs, while in the static storage duration case there are no operations at run time, is that right? If so, please just add the exact link to the en.cppreference and I will accept the answer. – Jan Turoň Oct 22 '17 at 11:10
  • @JanTuroň, I updated my answer to be more accurate, and included an additional explanation and link to static storage definition. – Daniel Trugman Oct 22 '17 at 12:08
0

To ensure at most one runtime allocation, you could write:

std::string str("Hi\0\0\0", 6);
str.resize(2);

However, in practice many string implementations use the Small String Optimization, which makes no allocations if the string is "short" (up to size 16 is suggested on that thread). So actually you would not suffer a reallocation by starting the string off at size 2 and later increasing to 6.

M.M
  • 130,300
  • 18
  • 171
  • 314