187

string_view was a proposed feature within the C++ Library Fundamentals TS(N3921) added to C++17

As far as i understand it is a type that represent some kind of string "concept" that is a view of any type of container that could store something viewable as a string.

  • Is this right ?
  • Should the canonical const std::string& parameter type become string_view ?
  • Is there another important point about string_view to take into consideration ?
Drax
  • 11,247
  • 5
  • 37
  • 77
  • 4
    Finally, someone realizes that strings need a different semantics, although introducing string_view is only a small step. – John Z. Li Aug 31 '18 at 03:19

2 Answers2

203

The purpose of any and all kinds of "string reference" and "array reference" proposals is to avoid copying data which is already owned somewhere else and of which only a non-mutating view is required. The string_view in question is one such proposal; there were earlier ones called string_ref and array_ref, too.

The idea is always to store a pair of pointer-to-first-element and size of some existing data array or string.

Such a view-handle class could be passed around cheaply by value and would offer cheap substringing operations (which can be implemented as simple pointer increments and size adjustments).

Many uses of strings don't require actual owning of the strings, and the string in question will often already be owned by someone else. So there is a genuine potential for increasing the efficiency by avoiding unneeded copies (think of all the allocations and exceptions you can save).

The original C strings were suffering from the problem that the null terminator was part of the string APIs, and so you couldn't easily create substrings without mutating the underlying string (a la strtok). In C++, this is easily solved by storing the length separately and wrapping the pointer and the size into one class.

The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.

Kerrek SB
  • 428,875
  • 83
  • 813
  • 1,025
  • 20
    The ship sailed on that philosophy with `reference_wrapper`, didn't it? – Steve Jessop Dec 27 '13 at 17:07
  • 6
    @KerrekSB I am afraid I don't follow. Could you expand on the *"such referential view classes have completely different ownership semantics from the rest of the standard library"* part, please? It's not clear to me: How is it different from dangling references / pointers? Or invalidated iterators due to insertion (e.g. std::vector)? We have these issues already, it is very natural to me that a non-owning view will have similar issues as non-owning pointers / references / iterators have. – Ali Dec 27 '13 at 17:26
  • @SteveJessop: Yes :-) I was almost going to include that. I guess the use of reference wrappers is fairly limited, but it's a valid point. – Kerrek SB Dec 27 '13 at 18:56
  • 5
    @Ali: When you're using any other standard library container, you can assert the correctness of the code just by looking at the code that uses the container. Not so for `string_view`. (I wasn't saying that you can never write broken code. Just that the brokenness is *local*.) – Kerrek SB Dec 27 '13 at 18:57
  • @KerrekSB OK, thanks for the clarification. Upvoted your answer! – Ali Dec 27 '13 at 19:10
  • @KerrekSB Okay so basically it represent an object which is only able to do read-only operations on a string at a cheaper performance cost. Somehow it looks like `std::weak_ptr`, but i guess that some kind of `lock` member function for "safe access" would take out all the point of performance gain. Anyway a `const std::string&` validity is already tied to the original string unless you copy it, so nothing is lost indeed. – Drax Dec 28 '13 at 11:23
  • 3
    @Drax: think also of situations where you pass in a string literal. If the function argument were a `const string &` you'd have to create a costly and exception-throwing temporary. With a string view, nothing needs to be copied or allocated. – Kerrek SB Dec 28 '13 at 11:25
  • @KerrekSB Indeed ! Do you think their will be some kind of `constexpr` constructor for initialization from literals or another `string_view` so they could be used as default string constants without unpredictable static order initialization ? – Drax Dec 28 '13 at 11:33
  • @Drax: I'm sure it's being worked on; I'm not sure if any of this will make it into the next standard. String literals are weird; you can't really tell them apart from arrays. – Kerrek SB Dec 28 '13 at 11:39
  • Iterators come to mind. (In fact some of the standardization work being done for `string_view` etc. involves making sure this will play nicely along ranges — keeping in mind a pair of iterators may very well be a model of a range, depending on details.) Not to mention `auto uhoh = [] { return { 1, 2, 3, 4, 5 }; }();`… – Luc Danton Dec 30 '13 at 09:49
  • @LucDanton: Brr. What's the type of `uhoh`?? – Kerrek SB Dec 30 '13 at 10:00
  • `std::initializer_list`. – Luc Danton Dec 30 '13 at 10:06
  • 6
    I'm surprised they didn't go with `std::range` from `boost::iterator_range` - IMO it's better than the string_view idea – Charles Salvia Jun 04 '15 at 19:16
  • 3
    I don't see the point of making a view non-mutating. Just make it mutable by default and add `const` iff desired like `string_span` in the [GSL](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-gsl). – nwp Dec 09 '15 at 14:27
  • 21
    @nwp: Many people and languages have come to lament C++'s awful defaults and think that "const" and "unshared" should be the default, with "mutable" and "shared" the explicit, rare exceptions. – Kerrek SB Dec 09 '15 at 14:33
  • 4
    @KerrekSB I agree with those people, but unless you rewrite C++ with `const` as default to be overwritten by `mutable` a view should not be default `const`. – nwp Dec 09 '15 at 14:41
  • 2
    You have to start somewhere? – ruipacheco Jan 25 '16 at 13:34
  • 2
    It's more to the string view than this. It can also handle compile-time strings including compile time hashing of compile-time strings. – Viktor Sehr May 07 '16 at 09:31
  • @ruipacheco Perhaps but consistency is king, particularly as C++ will _never_ be rewritten with `const` as default, so what you're starting will never be finished, and at the cost of an inconsistency that'll never be resolved. – Lightness Races in Orbit Oct 12 '18 at 11:19
  • 3
    @LightnessRacesinOrbit: "C++ will never be rewritten with `const` as default" - I thought that was called Rust? – Kevin Jun 04 '19 at 01:58
  • @Kevin Well exactly :D – Lightness Races in Orbit Jun 04 '19 at 10:19
  • 1
    But what's the advantage over `const std::string&` ? – peterflynn Apr 14 '20 at 19:59
  • 1
    @peterflynn: If your argument is, say, a string literal, then a `const std::string&` parameter would require dynamic allocation and copying, whereas a string view would not. – Kerrek SB Apr 16 '20 at 16:23
0

(Educating myself in 2021)

From Microsoft's <string_view>:

The string_view family of template specializations provides an efficient way to pass a read-only, exception-safe, non-owning handle to the character data of any string-like objects with the first element of the sequence at position zero. (...)

From Microsoft's C++ Team Blog std::string_view: The Duct Tape of String Types from August 21st, 2018 (retrieved 2021 Apr 01):

string_view solves the “every platform and library has its own string type” problem for parameters. It can bind to any sequence of characters, so you can just write your function as accepting a string view:

void f(wstring_view); // string_view that uses wchar_t's

and call it without caring what stringlike type the calling code is using (and > for (char*, length) argument pairs just add {} around them) (...)

(...)

Today, the most common “lowest common denominator” used to pass string data around is the null-terminated string (or as the standard calls it, the Null-Terminated Character Type Sequence). This has been with us since long before C++, and provides clean “flat C” interoperability. However, char* and its support library are associated with exploitable code, because length information is an in-band property of the data and susceptible to tampering. Moreover, the null used to delimit the length prohibits embedded nulls and causes one of the most common string operations, asking for the length, to be linear in the length of the string.

(...)

Each programming domain makes up their own new string type, lifetime semantics, and interface, but a lot of text processing code out there doesn’t care about that. Allocating entire copies of the data to process just to make differing string types happy is suboptimal for performance and reliability.

tymtam
  • 20,472
  • 3
  • 58
  • 92