4

I have an old project to maintain which used const char * around. For some reason, I want to keep lots of runtime generated string. So I create a global variable std::set for keeping these string. When a new string is generated, besides to be added into the set, also I would return and sends the newString.c_str() out which will be kept at somewhere else. For example.

std::set<std::string> g_stringDB;
void ArchieveString( AStruct *container, const char *temporaryString )
{
    auto it = g_stringDB.emplace( temporaryString );
    container->validString = it->first->c_str();
}

I am wonderring when container is used by outside ( I mean anywhere out of this function ). If pointer: validString still safe. Is it possible that the poiner already points to other stuff due to the copy, construct happened within the set? If no, what is an ideal method to implement this requirement?

Tromse
  • 190
  • 1
  • 7
Tinggo
  • 1,003
  • 1
  • 9
  • 17
  • Possible duplicate of [Iterator invalidation rules](https://stackoverflow.com/questions/6438086/iterator-invalidation-rules) – mch Mar 06 '19 at 09:51
  • 3
    @mch The rules for iterator invalidation are different. – Konrad Rudolph Mar 06 '19 at 09:53
  • There are two considerations. Adding an element to a `std::set` does not invalidate any of its iterators, including pointers to its elements. However, resizing a `std::string` does potentially invalidate iterators (including pointers to its elements). So, if you add a`std::string` to a set, it is safe to use a pointer returned its `c_str()` but ONLY if that string is never subsequently resized. Beyond your question, however, use of a static variable and a separate repository of pointers to data managed by that static is a fragile design in several ways. I'd consider rearchitecting. – Peter Mar 06 '19 at 10:03

2 Answers2

4

The rules for invalidation of a c_str() return are:

Passing a non-const reference to the string to any standard library function, or

Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and rend().

For sets elements, as the iterators are not invalidated, you are fine there, the strings objects are not changed.

So if the string are fixed, then you are fine.

Community
  • 1
  • 1
Matthieu Brucher
  • 19,950
  • 6
  • 30
  • 49
  • To add to @Peter comment, note that: the iterators of std::set are all const iterators. This means the string cannot be resized once in the set. (see 23.2.4 - 6 - `For associative containers where the value type is the same as the key type, both iterator and const_iterator are constant iterators`). It can however be removed which is a problem. The design remains fragile and not future proof... – fjardon Mar 06 '19 at 10:17
  • Agreed. As they may not have the green light to change the legacy code, it may be a first step towards pushing for fixing it. There are lots of unknowns as to how to improve the design. Perhaps in a subsequent question? – Matthieu Brucher Mar 06 '19 at 10:23
  • This is not about *iterator* invalidation, but pointer/reference invalidation or mutable access to objects as you mentioned (e.g. `std::vector v = ...; v[i] = 5;` which invalidates neither iterators nor references but changes the element). There are containers that provide pointer/reference stability without iterator stability on certain operations (e.g. `deque` on `push_front()` or `push_back()`), and, theoretically, one could write a container with converse guarantees. – Arne Vogel Mar 06 '19 at 12:39
2

It could be safe if several conditions are met.

First of all, according to cppreference std::basic_string::c_str()

The pointer obtained from c_str() may be invalidated by:

  1. Passing a non-const reference to the string to any standard library function,
  2. or Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and rend().

So, if none of these happens usage is safe. The above things could also happen through assignment operators, destructors, or any other thing that invalidates a reference to a std::set<std::string> element.

Things that don't invalidate these references are (or invalidate in very specific scenarios):

  1. std::set::insert() as explained in cppreference

    No iterators or references are invalidated.

    but there is a more fine-grained statement regarding to elements obtained through node handles (C++17), which makes sense:

    If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid. (since C++17)

  2. std::set::erase from cppreference

    References and iterators to the erased elements are invalidated. Other references and iterators are not affected

  3. Both std::set::emplace and std::set::emplace_hint say

    No iterators or references are invalidated.

  4. std::set::extract:

    Pointers and references to the extracted element remain valid, but cannot be used while element is owned by a node handle: they become usable if the element is inserted into a container.

    which means that after reinsertions the c_str string is safe again. But, this document does not say anything about the other references. It is possibly a defect in cppreference and/or in the standard. I'd like to see a comment regarding the standard.

  5. std::set::merge:

    all pointers and references to the transferred elements remain valid

So, as long as nothing modifies the objects in the set you should be safe. Make sure by reading the above list.

Michael Veksler
  • 7,337
  • 1
  • 16
  • 29