0

The common way to add lines extracted from a text file into a std::vector< std::string > where every element of vector correspond to a file's line is something like these example:

https://stackoverflow.com/a/8365024/7030542

std::string line;
std::vector<std::string> myLines;
while (std::getline(myfile, line))
{
    myLines.push_back(line);
}

or also

https://stackoverflow.com/a/12506764/7030542

std::vector<std::string> lines;
for (std::string line; std::getline( ifs, line ); /**/ )
    lines.push_back( line );

Does exist a most efficient way to do that like avoid the auxiliary string?

TheArchitect
  • 975
  • 2
  • 14
  • 23

3 Answers3

6

Don't overthink it:

std::vector<std::string> lines;
std::string line;
while(std::getline( ifs, line ))
    lines.push_back(std::move(line));

Note that the moved from line is in a valid, but indeterminate state, so calling std::getline is fine because that will replace the std::string's contents (whatever they may be) completely, eradicating any indeterminate state that was left behind by the move.

rubenvb
  • 69,525
  • 30
  • 173
  • 306
  • Maybe about move semantics you right, but you are missing the point. It doesn't need another string to do the same work. – TheArchitect May 16 '17 at 10:47
  • Another thing, in your code, when the loop run for the second time you try to add new data into string line that is in indeterminate state. – TheArchitect May 16 '17 at 10:59
  • @Menashe Yes, the second loop iteration will set the content of a valid `std::string` object with indeterminate contents. Since `std::getline` replaces the `std::string`'s contents, there is nothing wrong here. – rubenvb May 16 '17 at 11:26
  • Then you create a new string every cycle. I don't see any advantage to use a move semantic here. – TheArchitect May 16 '17 at 12:07
  • 1
    The fact that the string contents don't need to be copied on `push_back` is where the move makes a difference. – rubenvb May 16 '17 at 12:12
  • But you lose the advantage of move when you recreate the string line every cycle with getline. – TheArchitect May 16 '17 at 12:35
  • @MenasheRosemberg You appear to be confused. Assuming lines long enough to avoid SBO, each line a buffer is allocated. This buffer is then moved into the vector. No "advantage" is lost. – Yakk - Adam Nevraumont May 16 '17 at 13:37
  • I really don't understand how to avoid an auxiliary string cannot be better. The codes used has 4-5 steps (1-create auxiliary string, 2-write data on it, 3-create new element into vector, 4-copy/move auxiliary string to this new element, 5-for move semantic the getline need to rebuild the auxiliary string) instead avoid auxiliary string with 3 step (1-create new element into vector, 2-write data on it, 3-when the loop finished erase last element). – TheArchitect May 16 '17 at 13:56
  • 2
    Huh, all my years using this "pattern" and I never thought to move out of the string variable. heh – Lightness Races in Orbit May 16 '17 at 19:45
  • @yakk I read the specification of getline in http://www.cplusplus.com/reference/string/string/getline/ and getline doesn't use buffer instead "Each extracted character is appended to the string as if its member push_back was called. Move semantic here make no sense because you need to recreate line every loop. Better use a old copy. – TheArchitect May 17 '17 at 16:55
  • @MenasheRosemberg [Better reference](http://en.cppreference.com/w/cpp/string/basic_string/getline) -- it clears the passed in string, then appends. I have no idea why you think this has to do with "better use a old copy" rather than move-from. – Yakk - Adam Nevraumont May 17 '17 at 17:44
1

@rubenvb's answer is great.

As an alternative

bool get_line_into_vector( std::istream& is, std::vector<std::string>& v ) {
  std::string tmp;
  if (!std::getline(is, tmp))
     return false;
  v.push_back(std::move(tmp));
  return true;
}

std::vector<std::string> lines;
while(get_line_into_vector( ifs, lines ))
{} // do nothing

This is rubenvb's solution with the temporary moved into a helper function.

We can avoid the small buffer optimization sized copies of characters with this:

bool get_line_into_vector( std::istream& is, std::vector<std::string>& v ) {
  v.emplace_back();
  if (std::getline(is, v.back()))
    return true;
  v.pop_back();
  return false;
}

this can (in an edge case) cause an extra massive reallocation, but that is asymptotically rare.

Unlike @pschill's answer, here the invalid states are isolated to within a helper function, and all the flow control is centered around avoiding those invalid states from leaking.

The nice thing is that

std::vector<std::string> lines;
while(get_line_into_vector( ifs, lines ))
{} // do nothing

is how you use it; which of these two implementations you use is isolated to within the get_line_into_vector function. That lets you swap between them and determine which is better.

Yakk - Adam Nevraumont
  • 235,777
  • 25
  • 285
  • 465
  • For the move semantic solution, the getline’s specifications [cplusplus](http://www.cplusplus.com/reference/string/string/getline/) and [cpprefference](http://en.cppreference.com/w/cpp/string/basic_string/getline) says it doesn't use buffer but "Each extracted character is appended to the string as if its member push_back was called". – TheArchitect May 18 '17 at 09:55
  • And also getline expect a reference to a string then pass to it a string in an undefined state is not a good programming practice. Also, ‘recreated’ tmp string every cycle to move it again doesn’t seem more efficient then keep tmp alive during all cycle like the old fashion. – TheArchitect May 18 '17 at 10:12
  • Then avoid an auxiliary string is a great solution when you need to make a memory reservation (vector::reserve) before read the data (vectors those will get new element frequently for example), otherwise use an auxiliary string is better. – TheArchitect May 18 '17 at 10:12
-1

If you want to avoid temporary variables, you can use the last vector element as buffer:

std::vector<std::string> lines(1);
while (std::getline(ifs, lines.back())
    lines.emplace_back();
lines.erase(--lines.end());  // remove the buffer element
pschill
  • 4,030
  • 1
  • 10
  • 32
  • This is not recommendable: it's much harder to comprehend and may not be any faster than moving a temporary `string` (as in rubenvb's answer). – Walter May 16 '17 at 12:15
  • Yakk, thanks for your response. Can you explain a little more how it may cause extra memory allocation and when lines is in invalid state? And furthermore, how to wrap the emplace back and back to make a more efficient code? – TheArchitect May 16 '17 at 14:36
  • 1
    @menashe: in The worst case, your last real line fills up The vector' storageoplossingen, causing the extra emplaced line to reallocate the whole vector, and doubling its memory use. With The inkt result of The last element being deleted again. Granted, that's an edge case, bit it can happen. – rubenvb May 17 '17 at 05:23
  • It may happen, but if your vector will get data frequently or it is a big vector and also will receive new data, it is a good practice make a good reservation to this vector. In this cases (maybe others too ) that solution seems pretty nice. – TheArchitect May 17 '17 at 07:44
  • @MenasheRosemberg To ping someone, place an @ before their name. Simply stating their name won't do that (unless it is in response to a post by them, as opposed to a comment). `lines` is in an "invalid" state on every line except after the `while` condition (but before the body) and after `lines.erase`; by "invalid" I mean not in a sensible intermediate state between the initial state and the final one (it has an extra empty string at the end). Such states in my experience cause bugs. The extra memory allocation is because we make `lines` 1 larger than it has to be, then discard the extra. – Yakk - Adam Nevraumont May 17 '17 at 17:47