94

If I want to construct a std::string with a line like:

std::string my_string("a\0b");

Where i want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?

0x499602D2
  • 87,005
  • 36
  • 149
  • 233
Bill
  • 1,203
  • 3
  • 13
  • 19
  • 4
    You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string. See: http://stackoverflow.com/questions/10220401/c-string-literals-escape-character – David Stone Oct 14 '12 at 16:27

11 Answers11

132

Since C++14

we have been able to create literal std::string

#include <iostream>
#include <string>

int main()
{
    using namespace std::string_literals;

    std::string s = "pl-\0-op"s;    // <- Notice the "s" at the end
                                    // This is a std::string literal not
                                    // a C-String literal.
    std::cout << s << "\n";
}

Before C++14

The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.

To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:

std::string   x("pq\0rs");   // Two characters because input assumed to be C-String
std::string   x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.

Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().

Also check out Doug T's answer below about using a vector<char>.

Also check out RiaD for a C++14 solution.

Martin York
  • 234,851
  • 74
  • 306
  • 532
22

If you are doing manipulation like you would with a c-style string (array of chars) consider using

std::vector<char>

You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:

std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());

and you can use it in many of the same places you can use c-strings

printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';

Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.

Doug T.
  • 59,839
  • 22
  • 131
  • 193
  • If you are say trying to encode bytes to string ( grpc bytes is stored as string) use the vector method as specified in the answer; not the usual way (see below) which will NOT construct the entire string ```byte *bytes = new byte[dataSize]; std::memcpy(bytes, image.data, dataSize * sizeof(byte)); std::string test(reinterpret_cast(bytes)); std::cout << "Encoded String length " << test.length() << std::endl;``` – Alex Punnen Jul 29 '18 at 14:07
13

I have no idea why you'd want to do such a thing, but try this:

std::string my_string("a\0b", 3);
17 of 26
  • 26,201
  • 13
  • 63
  • 84
  • 1
    What are your concerns for doing this? Are you questioning the need to store "a\0b" ever? or questioning the use of a std::string for such storage? If the latter, what do you suggest as an alternative? – Anthony Cramp Oct 03 '08 at 00:08
  • I'm questioning why you'd want a string with a null in the middle of it. – 17 of 26 Oct 03 '08 at 15:42
  • 3
    @Constantin then you're doing something wrong if you're storing binary data as a string. That's what `vector` or `unsigned char *` were invented for. – Mahmoud Al-Qudsi Jan 04 '12 at 23:25
  • @Mahmoud Al-Qudsi, I agree with you, std::vector should be used for binary. I would also add, use std::vector instead of std::(w)string, the latter doesn't understand text anyway. – Constantin Jan 29 '12 at 20:14
  • 2
    I came across this while trying to learn more about security of strings. I wanted to test my code to make sure that it still works even if it reads a null character in while reading from a file / network what it expects to be textual data. I use `std::string` to indicate that the data should be considered as plain-text, but I am doing some hashing work and I want to make sure everything still works with null characters involved. That seems like a valid use of a string literal with an embedded null character. – David Stone Apr 19 '12 at 00:41
  • @17of26 Also note that std::string is unaware of UTF-8 encoding, yet valid UTF-8 bytes may contain \0 before the end of the string. – DuckMaestro Jul 09 '13 at 23:46
  • 3
    @DuckMaestro No, that's not true. A `\0` byte in a UTF-8 string can only be NUL. A multi-byte encoded character will never contain `\0`--nor any other ASCII character for that matter. – John Kugelman Sep 26 '13 at 12:55
  • @JohnKugelman Thanks for the correction. It seems I was misunderstanding the point of so called "modified UTF8". – DuckMaestro Sep 27 '13 at 22:23
  • Wouldn't we loose std::string optimization if we use std::vector instead. e.g. copy on write. ? – Ezra Nov 11 '13 at 09:05
  • 1
    I came across this when trying to provoke an algorithm in a test case. So there are valid reasons; albeit few. – namezero Nov 14 '14 at 11:16
  • 1
    @Ezra copy-on-write is not valid in C++11 anyway – graywolf Jul 24 '15 at 14:18
12

What new capabilities do user-defined literals add to C++? presents an elegant answer: Define

std::string operator "" _s(const char* str, size_t n) 
{ 
    return std::string(str, n); 
}

then you can create your string this way:

std::string my_string("a\0b"_s);

or even so:

auto my_string = "a\0b"_s;

There's an "old style" way:

#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string

then you can define

std::string my_string(S("a\0b"));
Community
  • 1
  • 1
anonym
  • 121
  • 1
  • 2
8

The following will work...

std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');
Andrew Stein
  • 11,807
  • 4
  • 30
  • 43
5

You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.

For example, I dropped this innocent looking snippet in the middle of a program

// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
    std::cerr << c;
    // 'Q' is way cooler than '\0' or '0'
    c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
    std::cerr << c;
}
std::cerr << "\n";

Here is what this program output for me:

Entering loop.
Entering loop.

vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ

That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.

You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.

Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.

Community
  • 1
  • 1
David Stone
  • 22,053
  • 14
  • 61
  • 77
  • 2
    As part of my preparation for this post, I submitted a bug report to gcc in hopes that they will add a warning to make this a little safer: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54924 – David Stone Oct 14 '12 at 17:07
4

In C++14 you now may use literals

using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3
RiaD
  • 42,649
  • 10
  • 67
  • 110
1

Better to use std::vector<char> if this question isn't just for educational purposes.

Harold Ekstrom
  • 1,508
  • 8
  • 7
1

anonym's answer is excellent, but there's a non-macro solution in C++98 as well:

template <size_t N>
std::string RawString(const char (&ch)[N])
{
  return std::string(ch, N-1);  // Again, exclude trailing `null`
}

With this function, RawString(/* literal */) will produce the same string as S(/* literal */):

std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;

Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:

std::string s = S("a\0b"); // ERROR!

...so it might be preferable to use:

#define std::string(s, sizeof s - 1)

Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.

Kyle Strand
  • 14,120
  • 3
  • 59
  • 143
-5

I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.

CComBSTR(20,"mystring1\0mystring2\0")
sth
  • 200,334
  • 49
  • 262
  • 354
Dil09
  • 9
  • This answer is too specific to Microsoft platforms and doesn't address the original question (which asked about std::string). – June Rhodes Feb 08 '12 at 08:50
-8

Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:

std::string s("aab");
s.at(1) = '\0';

but if you do, all your friends will laugh at you, you will never find true happiness.

Martin York
  • 234,851
  • 74
  • 306
  • 532
Jurney
  • 382
  • 3
  • 8
  • 1
    std::string is NOT required to be NULL terminated. – Martin York Oct 02 '08 at 19:52
  • 2
    It's not required to, but in almost all implementations, it is, probably because of the need for the c_str() accessor to provide you with the null terminated equivalent. – Jurney Oct 02 '08 at 20:10
  • 2
    For effeciency a null character _may_ be kept on the back of the data buffer. But none of the operations (ie methods) on a string use this knowledge or are affected by a string containing a NULL character. The NULL character will be manipulated in exactly the same way as any other character. – Martin York Oct 02 '08 at 20:50
  • This is why it's so funny that string is std:: - its behaviour is not defined on ANY platform. –  Nov 19 '12 at 02:10
  • 1
    I wish user595447 was still here so that I could ask them what on Earth they thought they were talking about. – underscore_d Jul 02 '16 at 16:46