45

I was working on a simple parser and when profiling I observed the bottleneck is in... file read! I extracted very simple test to compare the performance of fstreams and FILE* when reading a big blob of data:

#include <stdio.h>
#include <chrono>
#include <fstream>
#include <iostream>
#include <functional>

void measure(const std::string& test, std::function<void()> function)
{
    auto start_time = std::chrono::high_resolution_clock::now();

    function();

    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
    std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}

#define BUFFER_SIZE (1024 * 1024 * 1024)

int main(int argc, const char * argv[])
{
    auto buffer = new char[BUFFER_SIZE];
    memset(buffer, 123, BUFFER_SIZE);

    measure("FILE* write", [buffer]()
    {
        FILE* file = fopen("test_file_write", "wb");
        fwrite(buffer, 1, BUFFER_SIZE, file);
        fclose(file);
    });
    measure("FILE* read", [buffer]()
    {
        FILE* file = fopen("test_file_read", "rb");
        fread(buffer, 1, BUFFER_SIZE, file);
        fclose(file);
    });
    measure("fstream write", [buffer]()
    {
        std::ofstream stream("test_stream_write", std::ios::binary);
        stream.write(buffer, BUFFER_SIZE);
    });
    measure("fstream read", [buffer]()
    {
        std::ifstream stream("test_stream_read", std::ios::binary);
        stream.read(buffer, BUFFER_SIZE);
    });

    delete[] buffer;
}

The results of running this code on my machine are:

FILE* write 1388.59 ms
FILE* read 1292.51 ms
fstream write 3105.38 ms
fstream read 3319.82 ms

fstream write/read are about 2 times slower than FILE* write/read! And this while reading a big blob of data, without any parsing or other features of fstreams. I'm running the code on Mac OS, Intel I7 2.6GHz, 16GB 1600 MHz Ram, SSD drive. Please note that running again same code the time for FILE* read is very low (about 200 ms) probably because the file gets cached... This is why the files opened for reading are not created using the code.

Why when reading just a blob of binary data using fstream is so slow compared to FILE*?

EDIT 1: I updated the code and the times. Sorry for the delay!

EDIT 2: I added command line and new results (very similar to previous ones!)

$ clang++  main.cpp -std=c++11 -stdlib=libc++ -O3
$ ./a.out
FILE* write 1417.9 ms
FILE* read 1292.59 ms
fstream write 3214.02 ms
fstream read 3052.56 ms

Following the results for the second run:

$ ./a.out
FILE* write 1428.98 ms
FILE* read 196.902 ms
fstream write 3343.69 ms
fstream read 2285.93 ms

It looks like the file gets cached when reading for both FILE* and stream as the time reduces with the same amount for both of them.

EDIT 3: I reduced the code to this:

FILE* file = fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE, file);
fclose(file);

std::ofstream stream("test_stream_write", std::ios::binary);
stream.write(buffer, BUFFER_SIZE);

And started the profiler. It seems like stream spends lots of time in xsputn function, and the actual write calls have the same duration (as it should be, it's the same function...)

Running    Time     Self       Symbol Name
3266.0ms   66.9%    0,0        std::__1::basic_ostream<char, std::__1::char_traits<char> >::write(char const*, long)
3265.0ms   66.9%    2145,0          std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsputn(char const*, long)
1120.0ms   22.9%    7,0                 std::__1::basic_filebuf<char, std::__1::char_traits<char> >::overflow(int)
1112.0ms   22.7%    2,0                      fwrite
1127.0ms   23.0%    0,0        fwrite

EDIT 4 For some reason this question is marked as duplicate. I wanted to point out that I don't use printf at all, I use only std::cout to write the time. The files used in the read part are the output from the write part, copied with different name to avoid caching

Mircea Ispas
  • 18,498
  • 26
  • 109
  • 202
  • 6
    What OS, what compiler settings? – Mats Petersson Sep 29 '14 at 08:04
  • This takes into account construction time. Which is pretty slow for `iostream`/`fstream` components. Nearly non-existent for `FILE*`. – Rapptz Sep 29 '14 at 08:05
  • 2
    This doesn't answer your question, but bear in mind that on an OS with a same disk cache (e.g. Linux) your results will be heavily skewed (there's no way you read 1GB straight from an HDD in 200ms, for example.) – Oliver Charlesworth Sep 29 '14 at 08:06
  • @rapptz - multiple *seconds* to construct an object? – Oliver Charlesworth Sep 29 '14 at 08:09
  • @OliverCharlesworth Well, no. But they tend to be an issue of ~100ms or so. I know it happens with `std::stringstream`. – Rapptz Sep 29 '14 at 08:10
  • 2
    Shouldn't your first two `measure`s at least `fclose` the `FILE *`? – T.C. Sep 29 '14 at 08:10
  • Well, I can reproduce something similar on my Linux machine. My FILE write is a little better, fstream write is a fair bit worse. Possibly due to hardware differences or OS differences, but the symptom is definitely there. – Mats Petersson Sep 29 '14 at 08:12
  • 2
    @rapptz - if you can only construct 10 stringstreams per second, you have a very sad time! – Oliver Charlesworth Sep 29 '14 at 08:12
  • I also tried running the fstream one first, and the result remains. – Mats Petersson Sep 29 '14 at 08:12
  • @TC yes, posted the code before a meeting, sorry. Can somebody edit this please? Thank you! – Mircea Ispas Sep 29 '14 at 08:12
  • @Felics: Is this on Linux? If so, I think I know the answer. Will write a quick answer, but have to go to work - will be back later on for comments etc. – Mats Petersson Sep 29 '14 at 08:14
  • Probably still the same problem, I expect. – Mats Petersson Sep 29 '14 at 08:18
  • 2
    Mine are about the same: FILE* write 15751.8 ms | FILE* read 17202.8 ms | fstream write 16008.7 ms | fstream read 16987.9 ms | – Galik Sep 29 '14 at 08:24
  • 3
    http://stackoverflow.com/questions/477225/is-stdifstream-significantly-slower-than-file – DumbCoder Sep 29 '14 at 08:26
  • @Galik if you have a slow hdd the time is given by hdd speed, not by stream implementation – Mircea Ispas Sep 29 '14 at 08:27
  • 1
    Your tests are not showing realistic times, first you write 1.6s (1GB/1.6s=a very good SDD) but then you read the same data right after, 1GB/0.2s = 5GB/s, unless you got 10 SSD in raid 0 this must be the disk cache. Switch the read and write, place at least a 'fflush(NULL)' between them and use different files for each. – Surt Sep 29 '14 at 08:28
  • @Galik, you must have turned off writeback on your OS (Windows?). – Surt Sep 29 '14 at 08:30
  • @Surt I try this in about one hour, sorry for the delay... – Mircea Ispas Sep 29 '14 at 08:30
  • @Surt: Doesn't make a difference. `fflush` doesn't mean "clear the disk cache". – Oliver Charlesworth Sep 29 '14 at 08:30
  • I don't think the purpose of this test is really to test the hard disk performance, but rather understand why two different tests that supposedly do the same thing take different amounts of time. I think I understand what the difference is (on Unix-style OS's at least). – Mats Petersson Sep 29 '14 at 08:33
  • I have the same results for `fstream` and `FILE*` (even `fstream` is a little bit faster). OS - Windows. Compiler - MinGW 4.8.0 – borisbn Sep 29 '14 at 08:41
  • @OliverCharlesworth, your right, so the test must use different files that at the time of test is sure not to be in the disk cache. – Surt Sep 29 '14 at 08:48
  • 1
    @MatsPetersson, if the test are flawed you can't test anything, first we got to be sure that we are testing under the same conditions. – Surt Sep 29 '14 at 08:49
  • @Surt I updated the code and the measurements. Sorry for the delay - I was in a meeting and posted the code in a hurry. – Mircea Ispas Sep 29 '14 at 09:15
  • Besides not including relevant header (cstring), your example causes seg fault – BЈовић Sep 29 '14 at 09:15
  • @BЈовић Do you have all the files created before running the tests? – Mircea Ispas Sep 29 '14 at 09:17
  • @Felics No. Isn't the example going to create the file? – BЈовић Sep 29 '14 at 09:27
  • It looks like your implementation of fstream is not using disk cache but the FILE version is??? as the 2nd time you run it FILE gets read at cache speed while the fstream is read a SSD speed. – Surt Sep 29 '14 at 09:31
  • @BЈовић I removed file creation because the files get cached and the read measurements are not correct. – Mircea Ispas Sep 29 '14 at 09:39
  • @Surt it looks like fstream uses the disk cache. In the second run the time reduces from ~3 seconds to ~2 seconds – Mircea Ispas Sep 29 '14 at 10:04
  • 1
    On two of my systems (windows, linux) the times are pretty much the same; moreover they are nearly identical to the times needed to copy one file to another with `cp`. No clang+libc++ combination though, only libstdc++. I think it might be libc++ fault. – n. 'pronouns' m. Sep 29 '14 at 10:12
  • Try adding larger buffers to both stream and FILE, see answer. – Surt Sep 29 '14 at 11:14
  • Did you try disabling stdio synchonization? See http://stackoverflow.com/a/6821165/10077 – Fred Larson Sep 29 '14 at 14:46
  • @FredLarson Yes, same results (tested only with write) – Mircea Ispas Sep 29 '14 at 14:50
  • That std::__1 shouts TR1 at me which means C++03? – Surt Sep 29 '14 at 17:31
  • Yes, this is most definitely a libc++ fault. Same machine, same compiler, same SSD drive. Identical times with libstdc++, fstream almost twice as slow with libc++. – n. 'pronouns' m. Sep 29 '14 at 21:21
  • You did not unsync the C and C++ streams. This means the C++ streams are doing a lot of unnesacery work. Use `std::cout.sync_with_stdio(false);` to make sure the C++ streams are used correctly. – Martin York Sep 29 '14 at 21:25
  • @LokiAstari Somebody suggested this and I tried to set sync to false. I had same results... – Mircea Ispas Sep 29 '14 at 21:27
  • 2
    @LokiAstari He's not using `std::cout`. User-opened fstreams are not affected by `sync_with_stdio`. – n. 'pronouns' m. Sep 29 '14 at 21:29
  • @LokiAstari I don't use printf at all, only std::cout. The input files are the ones generated as output, but copied with different name to avoid caching. – Mircea Ispas Sep 29 '14 at 21:31
  • I reopened the question, as it has nothing to do with `sync_with_stdio`. It is likely that it's a bug(gette) in `libc++` from `clang++`, perhaps. – Mats Petersson Sep 29 '14 at 22:03
  • @MatsPetersson Thank you! I would really like to find the answer. I'm a very strong advocate for C++ std library at work and this doesn't help me:) thanks for your time! – Mircea Ispas Sep 29 '14 at 22:07
  • I'm trying to build a debug version of the library. If that succeeds, I'll be able to step into the functions and see what it does... Not had that much success yet... – Mats Petersson Sep 29 '14 at 22:20
  • Got an idea of using a `std::filebuf`, but it's not working yet, and it's past my bedtime... Will look again... – Mats Petersson Sep 29 '14 at 23:53
  • possible duplicate of [Why is reading lines from stdin much slower in C++ than Python?](http://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python) – 7hi4g0 Mar 07 '15 at 19:04

5 Answers5

20

It would seem that, on Linux, for this large set of data, the implementation of fwrite is much more efficient, since it uses write rather than writev.

I'm not sure WHY writev is so much slower than write, but that appears to be where the difference is. And I see absolutely no real reason as to why the fstream needs to use that construct in this case.

This can easily be seen by using strace ./a.out (where a.out is the program testing this).

Output:

Fstream:

clock_gettime(CLOCK_REALTIME, {1411978373, 114560081}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
writev(3, [{NULL, 0}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824}], 2) = 1073741824
close(3)                                = 0
clock_gettime(CLOCK_REALTIME, {1411978386, 376353883}) = 0
write(1, "fstream write 13261.8 ms\n", 25fstream write 13261.8 ms) = 25

FILE*:

clock_gettime(CLOCK_REALTIME, {1411978386, 930326134}) = 0
open("test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1073741824) = 1073741824
clock_gettime(CLOCK_REALTIME, {1411978388, 584197782}) = 0
write(1, "FILE* write 1653.87 ms\n", 23FILE* write 1653.87 ms) = 23

I don't have them fancy SSD drives, so my machine will be a bit slower on that - or something else is slower in my case.

As pointed out by Jan Hudec, I'm misinterpreting the results. I just wrote this:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>

void measure(const std::string& test, std::function<void()> function)
{
    auto start_time = std::chrono::high_resolution_clock::now();

    function();

    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
    std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}

#define BUFFER_SIZE (1024 * 1024 * 1024)


int main()
{
    auto buffer = new char[BUFFER_SIZE];
    memset(buffer, 0, BUFFER_SIZE);

    measure("writev", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY);
        struct iovec vec[] = 
        {
            { NULL, 0 },
            { (void *)buffer, BUFFER_SIZE }
        };
        writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
        close(fd);
    });

    measure("write", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY);
        write(fd, buffer, BUFFER_SIZE);
        close(fd);
    });
}

It is the actual fstream implementation that does something daft - probably copying the whole data in small chunks, somewhere and somehow, or something like that. I will try to find out further.

And the result is pretty much identical for both cases, and faster than both fstream and FILE* variants in the question.

Edit:

It would seem like, on my machine, right now, if you add fclose(file) after the write, it takes approximately the same amount of time for both fstream and FILE* - on my system, around 13 seconds to write 1GB - with old style spinning disk type drives, not SSD.

I can however write MUCH faster using this code:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <unistd.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <chrono>

void measure(const std::string& test, std::function<void()> function)
{
    auto start_time = std::chrono::high_resolution_clock::now();

    function();

    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now() - start_time);
    std::cout<<test<<" "<<static_cast<double>(duration.count()) * 0.000001<<" ms"<<std::endl;
}

#define BUFFER_SIZE (1024 * 1024 * 1024)


int main()
{
    auto buffer = new char[BUFFER_SIZE];
    memset(buffer, 0, BUFFER_SIZE);

    measure("writev", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY, 0660);
        struct iovec vec[] = 
        {
            { NULL, 0 },
            { (void *)buffer, BUFFER_SIZE }
        };
        writev(fd, vec, sizeof(vec)/sizeof(vec[0]));
        close(fd);
    });

    measure("write", [buffer]()
    {
        int fd = open("test", O_CREAT|O_WRONLY, 0660);
        write(fd, buffer, BUFFER_SIZE);
        close(fd);
    });
}

gives times of about 650-900 ms.

I can also edit the original program to give a time of approximately 1000ms for fwrite - simply remove the fclose.

I also added this method:

measure("fstream write (new)", [buffer]()
{
    std::ofstream* stream = new std::ofstream("test", std::ios::binary);
    stream->write(buffer, BUFFER_SIZE);
    // Intentionally no delete.
});

and then it takes about 1000 ms here too.

So, my conclusion is that, somehow, sometimes, closing the file makes it flush to disk. In other cases, it doesn't. I still don't understand why...

Mats Petersson
  • 119,687
  • 13
  • 121
  • 204
  • 1
    In conclusion, artifact of a bad implementation... that's libstdc++ I guess, what about libc++? – Deduplicator Sep 29 '14 at 08:42
  • I remember being surprised several years ago that `writev()` on Linux was much slower than `write()`. It wasn't even worth using `writev()` for its stated purpose, i.e. writing several buffers with a single call. – John Zwinck Sep 29 '14 at 08:45
  • Have you tried changing the filebuffer on the fstream? – Surt Sep 29 '14 at 08:46
  • 1
    I don't see how this answers the question. On my PC, the times are comparable (almost the same) – BЈовић Sep 29 '14 at 08:52
  • @B: what OS are you running? (Although to be fair, this answer shows *that* they are different, not *why*.) – Oliver Charlesworth Sep 29 '14 at 09:00
  • @BЈовић, have you copied the OPs example exactly? – Surt Sep 29 '14 at 09:07
  • I got 64 bit ubuntu 14.04. I copy&pasted the code. Compiled with `g++ -std=c++11 -Wall -Wextra`. And the times in both cases are very similar – BЈовић Sep 29 '14 at 09:15
  • @BЈовић, which is the expected result (fstream should be a couple (not 1000s ms) of ms slower due to overhead for more functionality). Now we would just have to figure out what the difference is to the OP's system. – Surt Sep 29 '14 at 09:27
  • @Surt He is not getting few ms difference. He is getting 2-3 times slower read/write with fstream – BЈовић Sep 29 '14 at 09:30
  • @BЈовић, slight confusion, I was talking about yours BJ, not Mats times. – Surt Sep 29 '14 at 09:33
  • @BJ: Do you mean your times are similar to Matt's? Or that your times are similar to *each other* (e.g. your fstream time is similar to your FILE* time)? – deworde Sep 29 '14 at 10:52
  • @Mats Petersson Please check my latests update - it seems like ofstream uses fwrite in the end, but it adds some overhead... – Mircea Ispas Sep 29 '14 at 12:41
  • OP is using Mac OSX, not Linux. If there is any difference between `write` and `writev` performance on Linux, it would be covered up by measurement noise for such a large data transfer. I suspect this is just an OSX bug. – R.. GitHub STOP HELPING ICE Sep 29 '14 at 16:28
  • 1
    -1. You are jumping at conclusion that does not follow from the experiment you've shown. You failed to distinguish the actual system calls from the library overhead and assuming it would be the system calls is unsubstantiated as the system calls end up calling the same file methods kernel-side (`writev` would only be less efficient if the request was chunked to many bits, but here it isn't). – Jan Hudec Sep 29 '14 at 16:58
  • Re: updates to answer: If you're writing at 1GB to an HDD in 650-900ms, you must actually be measuring time to write to disk cache... – Oliver Charlesworth Sep 30 '14 at 20:19
  • @OliverCharlesworth: Yes, absolutely. There is no way that data goes all the way to the disk. And I don't think, for most intents and purposes, it matters, as long as it isn't "lost". – Mats Petersson Sep 30 '14 at 20:28
  • @MatsPetersson: Yeah, I guess it depends on the use-case. Of course, for *really* large operations, the disk-cache rate isn't sustainable (because you fill up RAM)... – Oliver Charlesworth Sep 30 '14 at 20:29
  • Oh, of course. Eventually, the disk speed itself is severely limiting. That's clear. – Mats Petersson Sep 30 '14 at 20:30
7

TL;DR: Try adding this to your code before doing the writing:

const size_t bufsize = 256*1024;
char buf[bufsize];
mystream.rdbuf()->pubsetbuf(buf, bufsize);

When working with large files with fstream, make sure to use a stream buffer.

Counterintuitively, disabling stream buffering dramatically reduces performance. At least the MSVC implementation copies 1 char at a time to the filebuf when no buffer was set (see streambuf::xsputn()), which can make your application CPU-bound, which will result in lower I/O rates.

NB: You can find a complete sample application here.

Community
  • 1
  • 1
rustyx
  • 62,971
  • 18
  • 151
  • 210
  • 2
    I can confirm. When set the bufsize to 256*1024, my time reading one 25MB file dropped from 0.15s to 0.02s, on windows 7 with VS2015-update3. – flm8620 Sep 11 '18 at 12:16
2

A side note for whom interests. The main keywords are Windows 2016 server /CloseHandle.

In our app we discovered a NASTY bug on win2016 server.

Our std code under EVERY windows version takes: (ms)

time CreateFile/SetFilePointer 1 WriteFile 0 CloseHandle 0

on windows 2016 we got:

time CreateFile/SetFilePointer 1 WriteFile 0 CloseHandle 275

And times grows with dimension of file, that is ABSURD.

After a LOT of investigations (we first found "CloseHandle" is the culprit...) we discovered that under windows2016 MS attached an "hook" in close function that triggers "Windows Defender" to scan ALL the file and prevents returning until done. (in other words scanning is synchronous, that is PURE MADNESS).

When we added exclusion in "Defender" for our file, all works fine. I think is a BAD design, no antivirus stops normal file active INSIDE program space to scan files. (MS can do it as they have the power to do so.)

ingconti
  • 9,213
  • 2
  • 51
  • 39
1

The stream is somehow broken on the MAC, old implementation or setup.

An old setup could cause the FILE to be written in the exe directory and the stream in the user directory, this shouldn't make any difference unless you got 2 disks or other different setting.

On my lousy Vista I get Normal buffer+Uncached:
C++ 201103
FILE* write 4756 ms
FILE* read 5007 ms
fstream write 5526 ms
fstream read 5728 ms

Normal buffer+Cached:
C++ 201103
FILE* write 4747 ms
FILE* read 454 ms
fstream write 5490 ms
fstream read 396 ms

Large Buffer+cached:
C++ 201103
5th run:
FILE* write 4760 ms
FILE* read 446 ms
fstream write 5278 ms
fstream read 369 ms

This shows that the FILE write is faster than the fstream, but slower in read than fstream ... but all numbers are within ~10% of each other.

Try adding some more buffering to your stream to see if that helps.

const int MySize = 1024*1024;
char MrBuf[MySize];
stream.rdbuf()->pubsetbuf(MrBuf, MySize);

The equivalent for FILE is

const int MySize = 1024*1024;
if (!setvbuf ( file , NULL , _IOFBF , MySize )) 
    DieInDisgrace();
Surt
  • 13,762
  • 3
  • 21
  • 34
  • I get the `FILE* read()` results speeding up a lot after the first run of the application. Looks like the OS is doing some caching there. – Martin York Sep 29 '14 at 23:46
1

In contrary to other answers, a big issue with large file reads comes from buffering by the C standard library. Try using low level read/write calls in large chunks (1024KB) and see the performance jump.

File buffering by the C library is useful for reading or writing small chunks of data (smaller than disk block size).

On Windows I got almost a 3x performance boost dropping file buffering when reading and writing raw video streams.

I also opened the file using native OS (win32) API calls and told the OS not to cache the file as this involves yet another copy.

egur
  • 7,470
  • 2
  • 25
  • 47