Efficient file reading in C++11/14

Question

I'm creating a IOManager class in which I have a function to read a file and store it in a buffer. What is the most efficient way of doing that?

I currently have 2 pieces of code:

bool IOManager::readFileToBuffer(std::string filePath, std::vector<unsigned char>& buffer) {
    std::ifstream file(filePath, std::ios::binary);
    if (file.fail()) {
        perror(filePath.c_str());
        return false;
    }

    //seek to the end
    file.seekg(0, std::ios::end);

    //Get the file size
    int fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    //Reduce the file size by any header bytes that might be present
    fileSize -= file.tellg();

    buffer.resize(fileSize);
    file.read((char *)&(buffer[0]), fileSize);
    file.close();

    return true;
}

and

bool IOManager::readFileToBuffer(std::string filePath, std::vector<char>& buffer) {

    std::ifstream file(filePath, std::ios::binary);

    if (file.fail()) {
        perror(filePath.c_str());
        return false;
    }

    // copies all data into buffer
    std::vector<char> prov(
        (std::istreambuf_iterator<char>(file)),
        (std::istreambuf_iterator<char>()));

    buffer = prov;

    file.close();

    return true;
}

Which one is better? Is this the fastest and most efficient way of doing this according to the C++11/14 standards?

This related question suggests your second piece of code is slow: http://stackoverflow.com/questions/2602013/read-whole-ascii-file-into-c-stdstring — JDiMatteo, Aug 25 '15 at 19:29
You should time it youeself. In my tests the first version is much faster than the second. — Galik, Aug 25 '15 at 19:30
benchmarks here suggest first version is the way to go if you are concerned about speed: http://insanecoding.blogspot.com/2011/11/how-to-read-in-file-in-c.html — JDiMatteo, Aug 25 '15 at 19:31
This is a legitimate question, but... If your files are small, it does not matter what you do; anything will be fast enough. If your files are large, and you care about speed, slurping the whole thing into memory is a design error. — Nemo, Aug 25 '15 at 19:31
@Nemo, not neccessarily. If your files are large, and you have a lot of memory, and you know you will need the whole file, loading everythig in one shot is going to be (marginally) faster. — SergeyA, Aug 25 '15 at 19:48
@SergeyA: Processing files in blocks has better memory locality. More importantly, it allows for concurrency. Even for fully serial tasks, you want to read some bytes from disk at the same time you are processing others, not have the one completely wait on the other. Although I usually deal with files that do not fit in RAM (~1000 GB), even if they did, slurping them in before processing would always be slower. — Nemo, Aug 25 '15 at 21:20
I wonder, what is the best practice, instead, if the files are very large. Sounds like it is recommended to split those files into chunks. Is there any best practice available as example in order to, let's say, focus on it in mostly all cases ? — icbytes, Aug 26 '15 at 14:39

score 7 · Accepted Answer · answered Aug 25 '15 at 19:28

I would expect the first version to be faster than the second. It will be a single stream call, which would translate into single (unless there were signals) kernel read() call.

The second version right now has an issue of potential multiple reallocations in the vector, but this can be solved by first reserving the vector of approriate size and than copying into it from the iterators. But the bigger issue is that it will translate to multiple calls to read() function.

Efficient file reading in C++11/14

1 Answers1