28

The raw string literals in C++11 are very nice, except that the obvious way to format them leads to a redundant newline \n as the first character.

Consider this example:

    some_code();
    std::string text = R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

The obvious workaround seems so ugly:

    some_code();
    std::string text = R"(This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

Has anyone found an elegant solution to this?

Hugues
  • 2,111
  • 20
  • 33
  • 1
    I can't remember if `\\n` gets replaced with a space or just joins the lines without any spaces. – chris Jul 22 '14 at 05:16
  • 5
    All characters inside the raw string, including newlines and `\\` are interpreted literally. – Hugues Jul 22 '14 at 05:17
  • 5
    put `R"(` in next line – Bryan Chen Jul 22 '14 at 05:18
  • @Hugues, Never mind. I read through the first phase too quickly. – chris Jul 22 '14 at 05:23
  • 1
    @chris In fairness, that note doesn't directly specify anything by itself. It's really in 2.5 [lex.pptoken]: "If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified." –  Jul 22 '14 at 05:32
  • 5
    There's nothing elegant about having a string literal not indented per the rest of your code, but if you want multiline raw string literals like this - and for whatever reason don't follow Bryan's sane advice - a less sane way to get what you want is `= 1 + R"(`.... – Tony Delroy Jul 22 '14 at 05:33
  • @hvd, Ah, good point, thanks. Shame it doesn't work, at least until you want a backslash in your string before the newline. – chris Jul 22 '14 at 05:35
  • I was thinking about using newline as the delimiter (`R"` / `(First` / `Second` / `Third)` / `"`), but that doesn't work either: newline is one of the few characters that cannot be used in the delimiter. –  Jul 22 '14 at 05:40
  • @BryanChen Thanks for suggesting `R"(` on the next line. It still doesn't align the first text line perfectly though. – Hugues Jul 22 '14 at 15:20
  • @TonyD I like your suggestion of `= 1 + R"(`; it's very clever. Could you please post this as an answer? – Hugues Jul 22 '14 at 15:21
  • @Hugues: done... cheers. – Tony Delroy Jul 23 '14 at 02:13

7 Answers7

28

You can get a pointer to the 2nd character - skipping the leading newline - by adding 1 to the const char* to which the string literal is automatically converted:

    some_code();
    std::string text = 1 + R"(
This is the first line.
This is the second line.
This is the third line.
)";
    more_code();

IMHO, the above is flawed in breaking with the indentation of the surrounding code. Some languages provide a built-in or library function that does something like:

  • removes an empty leading line, and
  • looks at the indentation of the second line and removes the same amount of indentation from all further lines

That allows usage like:

some_code();
std::string text = unindent(R"(
    This is the first line.
    This is the second line.
    This is the third line.
    )");
more_code();

Writing such a function is relatively simple...

std::string unindent(const char* p)
{
    std::string result;
    if (*p == '\n') ++p;
    const char* p_leading = p;
    while (std::isspace(*p) && *p != '\n')
        ++p;
    size_t leading_len = p - p_leading;
    while (*p)
    {
        result += *p;
        if (*p++ == '\n')
        {
            for (size_t i = 0; i < leading_len; ++i)
                if (p[i] != p_leading[i])
                    goto dont_skip_leading;
            p += leading_len;
        }
      dont_skip_leading: ;
    }
    return result;
}

(The slightly weird p_leading[i] approach is intended to make life for people who use tabs and spaces no harder than they make it for themselves ;-P, as long as the lines start with the same sequence.)

Tony Delroy
  • 94,554
  • 11
  • 158
  • 229
  • 1
    Is it 1 character? Or is it 2? – Lightness Races in Orbit Jan 18 '19 at 15:32
  • @LightnessRacesinOrbit: in the context being discussed, one. The compiler may arrange for distinct carriage returns and line feeds to be generated for each `'\n'` during output, but the code and techniques presented above are - I believe - portable regardless of such later conversions. If you have reason to believe otherwise, please do share. – Tony Delroy Jan 22 '19 at 14:00
  • 2
    I'm just raising a talking point really. Do we know for sure that there is only a newline? Is that guaranteed? Or does it depend on the encoding of the source file? If the latter then your [first] solution is too optimistic. – Lightness Races in Orbit Jan 22 '19 at 15:04
  • 1
    The answer is [here](https://stackoverflow.com/a/39886017/11279879). It is 1 character, a single `'\n'`. – Dr. Gut Nov 21 '20 at 22:58
6

This is probably not what you want, but just in case, you should be aware of automatic string literal concatenation:

    std::string text =
"This is the first line.\n"
"This is the second line.\n"
"This is the third line.\n";
Brian Bi
  • 91,815
  • 8
  • 136
  • 249
3

I recommend @Brian's answer, especially if you only need to have few lines of text, or that which you can handle with your text editor-fu. I have an alternative if that isn't the case.

    std::string text =
"\
This is the first line." R"(
This is the second line.
This is the third line.)";

Live example

Raw string literals can still concatenate with "normal" string literals, as shown in the code. The "\ at the start is meant to "eliminate" the " character from the first line, putting it in a line of its own instead.

Still, if I were to decide, I would put such lotsa-text into a separate file and load it at runtime. No pressure to you though :-).

Also, that is one of the uglier code I've written these days.

Mark Garcia
  • 16,438
  • 3
  • 50
  • 93
3

The closest I can see is:

std::string text = ""
R"(This is the first line.
This is the second line.
This is the third line.
)";

It would be a bit nicer if a whitespace was allowed in the delimiter sequence. Give or take the indentation:

std::string text = R"
    (This is the first line.
This is the second line.
This is the third line.
)
    ";

My preprocessor will let you off with a warning about this, but unfortunately it's a bit useless. Clang and GCC get thrown off completely.

Potatoswatter
  • 126,977
  • 21
  • 238
  • 404
2

The accepted answer produces the warning cppcoreguidelines-pro-bounds-constant-array-index from clang-tidy. See Pro.bounds: Bounds safety profile for details.

If you don't have std::span but you're at least compiling with C++17 consider:

constexpr auto text = std::string_view(R"(
This is the first line.
This is the second line.
This is the third line.
)").substr(1);

The main advantages are readability (IMHO) and that you can turn on that clang-tidy warning in the rest of your code.

Using gcc if someone does inadvertently reduce the raw string to an empty string you get a compiler error (demo) with this approach, while the accepted approach either produces nothing (demo) or depending on your compiler settings an "outside bounds of constant string" warning.

davidvandebunte
  • 932
  • 9
  • 22
  • *"or depending on your compiler settings an "outside bounds of constant string" warning."* - what compiler setting would produce such a warning? Empty strings are still guaranteed (by the C++ Standard) to be null terminated.... – Tony Delroy Mar 15 '21 at 22:18
1

Yep, that is annoying. Perhaps there should be raw literals (R"PREFIX(") and multiline raw literals (M"PREFIX).

I came up with this alternative which almost describes itself:

#include<iterator> // std::next
...
{
    ...
    ...
    std::string atoms_text = 
std::next/*_line*/(R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");
    assert( atoms_text[0] != '\n' );
    ...
}

Limitations:

  1. If the raw literal is empty it will generate an invalid string. But that should be obvious to spot.
  2. If the raw literal doesn't start with a new line it will eat the first character instead.
  3. std::next is constexpr only from C++17, you then can use 1+(char const*)R"XYZ(" but it is not as clear and might produce warning.
constexpr auto atom_text = 1 + (R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ");

Also, no warranties ;) . After all, I don't know if it is legal to do arithmetic with pointers to static data.


Another advantage of the + 1 approach is that it can be put at the end:

constexpr auto atom_text = R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ" + 1;

Possibilities are endless:

constexpr auto atom_text = &R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ"[1];
constexpr auto atom_text = &1[R"XYZ(
  O123        12.4830720891       13.1055820441        9.5288258996
  O123        13.1055820441       13.1055820441        9.5288258996
)XYZ"];
alfC
  • 10,293
  • 4
  • 42
  • 88
0

I had the very same problem and I think the following solution is the best of all the above. I hope it'll be helpful for you, too (see example in the comment):

/**
 * Strips a multi-line string's indentation prefix.
 *
 * Example:
 * \code
 *   string s = R"(|line one
 *                 |line two
 *                 |line three
 *                 |)"_multiline;
 *   std::cout << s;
 * \endcode
 *
 * This prints three lines: @c "line one\nline two\nline three\n"
 *
 * @author Christian Parpart <christian@parpart.family>
 */

inline std::string operator ""_multiline(const char* text, unsigned long size) {
  if (!*text)
    return {};

  enum class State {
    LineData,
    SkipUntilPrefix,
  };

  constexpr char LF = '\n';
  State state = State::LineData;
  std::stringstream sstr;
  char sep = *text++;

  while (*text) {
    switch (state) {
      case State::LineData: {
        if (*text == LF) {
          state = State::SkipUntilPrefix;
          sstr << *text++;
        } else {
          sstr << *text++;
        }
        break;
      }
      case State::SkipUntilPrefix: {
        if (*text == sep) {
          state = State::LineData;
          text++;
        } else {
          text++;
        }
        break;
      }
    }
  }

  return sstr.str();
}
christianparpart
  • 753
  • 7
  • 13
  • I don't see `_multiline` in the example, has the code drifted away from the comments? – Quentin Jun 22 '18 at 12:08
  • Nice code. I still like the current `1 + ` solution because it can also be used to assign to a `constexpr char*`. – Hugues Jun 23 '18 at 14:14
  • @Hugues, you may still use the constexpr_string class within that code (instead of std::stringstream), but I believe, this functionality should definitely be in the C++ standard library instead. – christianparpart Jun 23 '18 at 16:46