Performance of creating a C++ std::string from an input iterator

Question

I'm doing something really simple: slurping an entire text file from disk into a std::string. My current code basically does this:

std::ifstream f(filename);
return std::string(std::istreambuf_iterator<char>(f), std::istreambuf_iterator<char>());

It's very unlikely that this will ever have any kind of performance impact on the program, but I still got curious whether this is a slow way of doing it.

Is there a risk that the construction of the string will involve a lot of reallocations? Would it be better (that is, faster) to use seekg()/tellg() to calculate the size of the file and reserve() that much space in the string before doing the reading?

Have you seen http://stackoverflow.com/questions/116038/what-is-the-best-way-to-slurp-a-file-into-a-stdstring-in-c ? — Logan Capaldo, Feb 07 '09 at 21:25

score 36 · Accepted Answer · edited Feb 08 '09 at 04:26

36

I benchmarked your implementation(1), mine(2), and two others(3 and 4) that I found on stackoverflow.

Results (Average of 100 runs; timed using gettimeofday, file was 40 paragraphs of lorem ipsum):

readFile1: 764
readFile2: 104
readFile3: 129
readFile4: 402

The implementations:

string readFile1(const string &fileName)
{
    ifstream f(fileName.c_str());
    return string(std::istreambuf_iterator<char>(f),
            std::istreambuf_iterator<char>());
}

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(&bytes[0], fileSize);

    return string(&bytes[0], fileSize);
}

string readFile3(const string &fileName)
{
    string data;
    ifstream in(fileName.c_str());
    getline(in, data, string::traits_type::to_char_type(
                      string::traits_type::eof()));
    return data;
}

string readFile4(const std::string& filename)
{
    ifstream file(filename.c_str(), ios::in | ios::binary | ios::ate);

    string data;
    data.reserve(file.tellg());
    file.seekg(0, ios::beg);
    data.append(istreambuf_iterator<char>(file.rdbuf()),
                istreambuf_iterator<char>());
    return data;
}

edited Feb 08 '09 at 04:26

Frank Krueger

64,851
44
155
203

answered Feb 08 '09 at 00:22

CTT

14,997
6
37
36

@CTT: I also benchmarked these, averaging 100 runs against the text of Moby Dick (1.3M). I tested the four functions show above plus additional one I found on SO. readFile2 executed around 20% faster. – paxos1977 Feb 08 '09 at 03:22
Thanks. A sloppy benchmark gives about the same result on my machine. – CAdaker Feb 08 '09 at 03:28
2

On Windows at least, readFile1 and 3 will not return the same thing than readFile2 and 4. The former will convert CRLF to LF, and not the latter. – Éric Malenfant Feb 09 '09 at 14:56
1

readFile2 can be improved by replacing the vector with: string file(filesize,'\0'); ifs.read(&file[0], filesize); return file;//uses NRVO to avoid copy constructor – velcrow Oct 29 '12 at 17:21
@velcrow - in C++11 that is a valid improvement. In C++03 std::string was not guaranteed to be contiguous. – CTT Oct 29 '12 at 23:59
Presumably the benchmark times are in microseconds and, therefore, lower is better? – mpb Jul 29 '17 at 23:05
Note that using tellg [may not give you the right value](https://stackoverflow.com/a/22986486/2924421) – Phylliida Apr 25 '18 at 17:40

score 2 · Answer 2 · answered Feb 07 '09 at 21:15

What happens to the performance if you try doing that? Instead of asking "which way is faster?" you can think "hey, I can measure this."

Set up a loop that reads a file of a given size 10000 times or something, and time it. Then do it with the reserve() method and time that. Try it with a few different file sizes (from small to enormous) and see what you get.

James Matta · Answer 3 · 2009-02-08T22:39:24.590

To be honest I am not certain but from what I have read, it really depends on the iterators. In the case of iterators from file streams it probably has no built in method to measure the length of the file between the begin and the end interator.

If this is correct it will operate by something similar to doubling it's internal storage size every time it runs out of space. In this case for n characters in the file there will be Log[n,2] memory allocations, and memory deletions, and n*Log[n,2] individual character copies, on top of just copying the characters into the string.

As Greg pointed out though, you might as well test it. As he said try it for a variety of file sizes for both techniques. Additionally you can use the following to get some quantitative timings.

#include<time.h>
#include<iostream>

...

clock_t time1=0, time2=0, delta;
float seconds;

time1=clock();

//Put code to be timed here

time2=clock();

delta= time2-time1;

seconds =(((float)delta)/((float)CLOCKS_PER_SEC));

std::cout<<"The operation took: "<<seconds<<" seconds."<<std::endl;

...

this should do the trick for the timing.

Performance of creating a C++ std::string from an input iterator

3 Answers3

Linked