0

I'm trying to read and write a few megabytes of data stored in files, consisting out of 8 floats converted to strings per line, to my SSD. Looking up C++ code and implementing some of the answers here for reading and writing files yielded me this code for reading a file:

std::stringstream file;
std::fstream stream;
stream.open("file.txt", std::fstream::in);
file << stream.rdbuf();
stream.close(); 

And this code for writing files:

stream.write(file.str().data(), file.tellg());

The problem is, that this code is very slow, compared to the speed of my SSD. My SSD has a reading speed of 2400 MB/s and a writing speed of 1800 MB/s. But my program has a read speed of only 180.6 MB/s and a write speed of 25.11 MB/s.

Because some asked how I measure the speed, I obtain a std::chrono::steady_clock::time_point using std::chrono::steady_clock::now() and then do a std::chrono::duration_cast. Using the same 5.6MB large file and dividing the file size by the measured time, I get the megabytes per second.

How can I increase the speed of reading and writing to files, while using only standard C++ and STL?

user11914177
  • 677
  • 3
  • 21
  • 1
    Serialize your data and write all of it in one command in binary mode. – sweenish Feb 23 '20 at 16:01
  • Does this answer your question? [Fast textfile reading in c++](https://stackoverflow.com/questions/17925051/fast-textfile-reading-in-c) – Ayxan Haqverdili Feb 23 '20 at 16:02
  • 1
    @Ayxan this is using boost or system libraries. I want STL only – user11914177 Feb 23 '20 at 16:04
  • @user11914177 Not all the answers. Besides, you should probably start a bounty on that question and specify your constraints. – Ayxan Haqverdili Feb 23 '20 at 16:06
  • 1
    "large amounts" but how large? 1GB or 1TB? And do you write to multiple files or just a single file? What are the properties of the input file? Are the lines long or short? There are lot of factors that affect read/write speed that you didn't show. But the first thing to do is to increase the buffer size – phuclv Feb 23 '20 at 16:16
  • @phuclv probably using "quite large amounts" made this sound too big(I'm not an expert in naming sizes of data). I updated the question to add some more details. – user11914177 Feb 23 '20 at 16:27
  • Would it be more accurate to say that you copy text files, since your reading and writing examples read/write by line and ignore the contents? On Linux you probably cannot beat [`sendfile`](http://man7.org/linux/man-pages/man2/sendfile.2.html). – Maxim Egorushkin Feb 23 '20 at 17:32
  • 1
    How do you measure the speeds? Post the complete code of your benchmark, see https://stackoverflow.com/help/minimal-reproducible-example – Maxim Egorushkin Feb 23 '20 at 17:53
  • @MaximEgorushkin why is there always that one guy, that asks completely irrelevant questions? But anyways there you have your answer – user11914177 Feb 23 '20 at 18:32
  • @user11914177 That guy is there to help that other guy who thinks he knows everything but makes silly mistakes. – Maxim Egorushkin Feb 23 '20 at 18:54

4 Answers4

1

In your sample, the slow part is likely the repeated calls to getline(). While this is somewhat implementation-dependent, typically a call to getline eventually boils down to an OS call to retrieve the next line of text from an open file. OS calls are expensive, and should be avoided in tight loops.

Consider a getline implementation that incurs ~1ms of overhead. If you call it 1000 times, each reading ~80 characters, you've acquired a full second of overhead. If, on the other hand, you call it once and read 80,000 characters, you've removed 999ms of overhead and the function will likely return nearly instantaneously.

(This is also one reason games and the like implement custom memory management rather than just malloc and newing all over the place.)

For reading: Read the entire file at once, if it'll fit in memory.

See: How do I read an entire file into a std::string in C++?

Specifically, see the slurp answer towards the bottom. (And take to heart the comment about using a std::vector instead of a char[] array.)

If it won't all fit in memory, manage it in large chunks.

For writing: build your output in a stringstream or similar buffer, and then write it one step, or in large chunks to minimize the number of OS round trips.

3Dave
  • 26,903
  • 17
  • 82
  • 145
  • @user11914177 [Read whole ASCII file into C++ std::string](https://stackoverflow.com/q/2602013/10147399) – Ayxan Haqverdili Feb 23 '20 at 16:04
  • So for reading I have quite an improvement, I get 180.6 MB/s. For writing, I already use a string stream for writing to the file – user11914177 Feb 23 '20 at 16:18
  • @user11914177 Add your updated code to your question. – 3Dave Feb 23 '20 at 16:31
  • 1
    @user11914177 I'll think about that one. Getting ready for a flight at the moment. I'm wondering if using a `std::vector` rather than a stringstream would be faster. I'm not sure if `stream.str()` performs any conversion, or if it just returns a reference to the stream's internal buffer(s). If it's converting, that will generate some overhead that could be substantial. Good luck! – 3Dave Feb 23 '20 at 16:38
  • @3Dave I tried the vector example and I only got 26.3 MB/s, so this seems to be much slower – user11914177 Feb 23 '20 at 16:47
  • @3Dave can you help to open the question? – user11914177 Feb 24 '20 at 13:25
1

You can try to copy the whole file at once and see if that improves the speed:

#include <algorithm>
#include <fstream>
#include <iterator>

int main() {
    std::ifstream is("infile");
    std::ofstream os("outfile");

    std::copy(std::istreambuf_iterator<char>(is), std::istreambuf_iterator<char>{},
              std::ostreambuf_iterator<char>(os));

    // or simply: os << is.rdbuf()
}
Ted Lyngmo
  • 37,764
  • 5
  • 23
  • 50
1

I made a short evaluation for you.

I have written a test program, that first creates a test file.

Then I did several improvement methods:

  1. I switch on all compiler optimizations
  2. For the string, i use resize to avoid reallocations
  3. Reading from the stream is drastically improved by setting a bigger input buffer

Please see and check, if you can implement one of my ideas for your solution


Edit

Strip down test program to pure reading:

#include <string>
#include <iterator>
#include <iostream>
#include <fstream>
#include <chrono>
#include <algorithm>

constexpr size_t NumberOfExpectedBytes = 80'000'000;
constexpr size_t SizeOfIOStreamBuffer = 1'000'000;
static char ioBuffer[SizeOfIOStreamBuffer];

const std::string fileName{ "r:\\log.txt" };

void writeTestFile() {
    if (std::ofstream ofs(fileName); ofs) {
        for (size_t i = 0; i < 2'000'000; ++i)
            ofs << "text,text,text,text,text,text," << i << "\n";
    }
}


int main() {

    //writeTestFile();

    // Make string with big buffer
    std::string completeFile{};
    completeFile.resize(NumberOfExpectedBytes);

    if (std::ifstream ifs(fileName); ifs) {

        // Increase buffer size for buffered input
        ifs.rdbuf()->pubsetbuf(ioBuffer, SizeOfIOStreamBuffer);

        // Time measurement start
        auto start = std::chrono::system_clock::now();

        // Read complete file
        std::copy(std::istreambuf_iterator<char>(ifs), {}, completeFile.begin());

        // Time measurement evaluation
        auto end = std::chrono::system_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
        // How long did it take?
        std::cout << "Elapsed time:       " << elapsed.count() << " ms\n";
    }
    else std::cerr << "\n*** Error.  Could not open source file\n";

    return 0;
}

With that I do achieve 123,2MB/s

Armin Montigny
  • 7,879
  • 3
  • 11
  • 29
  • If I haven't made some mistakes, you are reading at 37.5 MB/s. This is an improvement, but it is still quite slow. – user11914177 Feb 23 '20 at 16:11
  • Hm, 52,4MB/s on my machine. But doing many other stuff in my example code . . . But I understand. I will edit and strip down my example code – Armin Montigny Feb 23 '20 at 16:18
  • Thanks for your shortened code, as far as I have read from other answers, `std::copy(std::istreambuf_iterator(ifs), {}, completeFile.begin());` seems to be the same as the `readbuf()` function from `ifstream`. The speeds are similar. Can you help reopening the question? – user11914177 Feb 27 '20 at 11:27
0

Looks like you are outputting formatted numbers to a file. There are two bottlenecks already: formatting the numbers into human readable form and the file I/O.

The best performance you can achieve is to keep the data flowing. Starting and stopping requires overhead penalties.

I recommend double buffering with two or more threads.

One thread formats the data into one or more buffers. Another thread writes the buffers to the file. You'll need to adjust the size and quantity of buffers to keep the data flowing. When one thread finishes a buffer, the thread starts processing another buffer. For example, you could have the writing thread use fstream.write() to write the entire buffer.

The double buffering with threads can also be adapted for reading. One thread reads the data from the file into one or more buffers and another thread formats the data (from the buffers) into internal format.

Thomas Matthews
  • 52,985
  • 12
  • 85
  • 144
  • Seems to me that this is IO bound. – 3Dave Feb 25 '20 at 06:31
  • @3Dave Yes, writing to files is I/O bound, by its very nature. I/O is one of the slower bottlenecks for a program. The objective is to keep the data streaming. – Thomas Matthews Feb 25 '20 at 16:00
  • My point was that, if the IO is the bottleneck, threads wouldn't provide any perf benefit. If the CPU is starved then it'd be a different matter. – 3Dave Feb 25 '20 at 16:02