Many other posts, like " Read whole ASCII file into C++ std::string " explain what some of the options are but do not describe pro and cons of various methods in any depth. I want to know why one method is preferable over another?
All of these use std::fstream
to read the file into a std::string
. I am unsure what the costs and benefits of each method. Lets assume this is for the common case where the read files are known to be of some smallish size memory can easily accommodate, clearly reading a multi-terrabyte file into an memory is a bad idea no matter how you do it.
The most common way after a few googles searches to read a whole file into an std::string involves using std::getline
and appending a newline character to it after each line. This seems needless to me, but is there some performance or compatibility reason that this is ideal?
std::string Results;
std::ifstream ResultReader("file.txt");
while(ResultReader)
{
std::getline(ResultReader, Results);
Results.push_back('\n');
}
Another way I pieced together is to change the getline delimiter so it is something not in the file. The EOF char is seems unlikely to be in the middle of the file so that seems a likely candidate. This includes a cast so there is at least one reason not to do it, but this does read a file at once with no string concatenation. Presumably there is still some cost for the delimiter checks. Are there any other good reasons not to do this?
std::string Results;
std::ifstream ResultReader("file.txt");
std::getline(ResultReader, Results, (char)std::char_traits<char>::eof());
The cast means that on systems that define std::char_traits::eof() as something other than -1 might have problems. Is this a practical reason to not choose this over other methods that use std::getline
and string::push_pack('\n')
.
How does these compare to other ways of reading the file at once like in this question: Read whole ASCII file into C++ std::string
std::ifstream ResultReader("file.txt");
std::string Results((std::istreambuf_iterator<char>(ResultReader)),
std::istreambuf_iterator<char>());
It would seem this would be best. It offloads almost all the work onto the standard library which ought to be heavily optimized for the given platform. I see no reason for checks other than stream validity and the end of the file. Is this ideal or are there problems with this that are unseen.
Does the standard or do details of some implementation provide reasons to prefer some method over another? Have I missed some method that might prove ideal in a wide variety of circumstances?
What is a simplest, most idiomatic, best performing and standard compliant way of reading a whole file into an std::string
?
EDIT - 2 This question has prompted me to write a small suite of benchmarks. They are MIT license and available on github at: https://github.com/Sqeaky/CppFileToStringExperiments
Fastest - TellSeekRead and CTellSeekRead- These have the system provide an easy to get the size and reads the file in one go.
Faster - Getline Appending and Eof - The checking of chars does not seem to impose any cost.
Fast - RdbufMove and Rdbuf - The std::move seems to make no difference in release.
Slow - Iterator, BackInsertIterator and AssignIterator - Something is wrong with iterators and input streams. The work great in memory, but not here. That said some of these are faster than others.
I have added every method suggested so far, including those in links. I would appreciate if someone could run this on windows and with other compilers. I currently do not have access to a machine with NTFS and it has been noted that this and compiler details could be important.
As for measuring simplicity and idiomatic-ness how do we measure these objectively? Simplicity seems doable, perhaps use something line LOCs and Cyclomatic complexity, but how idiomatic something is seems purely subjective.