5

Note: This is not a duplicate of existing std::ios::sync_with_stdio(false) questions. I have gone through all of them and yet I am unable to make cout behave as fast as printf. Example code and evidence shown below.

I have three source code files:

// ex1.cpp

#include <cstdio>
#include <chrono>

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 10000000; i++) {
        printf("%d\n", i);
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1);
    fprintf(stderr, "%lld\n", duration.count());
}
// ex2.cpp

#include <iostream>
#include <chrono>

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 10000000; i++) {
        std::cout << i << '\n';
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1);
    std::cerr << duration.count() << '\n';
}
// ex3.cpp

#include <iostream>
#include <chrono>

int main()
{
    std::ios::sync_with_stdio(false);
    std::cin.tie(nullptr);

    auto t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 10000000; i++) {
        std::cout << i << '\n';
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1);
    std::cerr << duration.count() << '\n';
}

I am not going to ever mix cstdio and iostream in my code, so the hacks used in ex3.cpp are okay for me.

I compile them with clang++ on macOS with solid-state drive.

clang++ -std=c++11 -O2 -Wall -Wextra -pedantic ex1.cpp -o ex1
clang++ -std=c++11 -O2 -Wall -Wextra -pedantic ex2.cpp -o ex2
clang++ -std=c++11 -O2 -Wall -Wextra -pedantic ex3.cpp -o ex3

Now I run them and time them.

$ time ./ex1 > out.txt
1282

real    0m1.294s
user    0m1.217s
sys     0m0.071s

$ time ./ex1 > out.txt
1299

real    0m1.333s
user    0m1.221s
sys     0m0.072s

$ time ./ex1 > out.txt
1277

real    0m1.295s
user    0m1.214s
sys     0m0.070s
$ time ./ex2 > out.txt
3102

real    0m3.371s
user    0m3.037s
sys     0m0.075s

$ time ./ex2 > out.txt
3153

real    0m3.164s
user    0m3.073s
sys     0m0.075s

$ time ./ex2 > out.txt
3136

real    0m3.150s
user    0m3.051s
sys     0m0.077s
$ time ./ex3 > out.txt
3118

real    0m3.513s
user    0m3.045s
sys     0m0.080s

$ time ./ex3 > out.txt
3113

real    0m3.124s
user    0m3.042s
sys     0m0.077s

$ time ./ex3 > out.txt
3095

real    0m3.107s
user    0m3.029s
sys     0m0.073s

The results are quite similar even if I redirect the output to /dev/null. The results are quite similar with -O3 optimization level too.

Both ex3 and ex2 are slower than ex1? Is it possible to use std::cout in anyway that gives comparable speed with printf?

Lone Learner
  • 12,094
  • 14
  • 66
  • 159
  • My question for you is: Why is this a problem? What are you doing that leads to output being such a major bottleneck in your program? – Some programmer dude Mar 22 '21 at 09:16
  • 4
    @Someprogrammerdude This is a problem because I don't understand why the results I am getting are inconsistent with other answers on this topic. I am not doing anything that leads to output being a bottleneck. I am trying to understand if disabling synchronization with `stdio` does indeed lead to the side effect of better performance or not, and if not, why not? – Lone Learner Mar 22 '21 at 09:21
  • It may be worth timing inside the program itself. It may be that (for example) a lot more code gets put into the binary, so the start-up time is longer? – Galik Mar 22 '21 at 09:25
  • I'm just not getting the same results as you tbh. In my tests they get progressively faster. `0m1.309s`, `0m1.274s` and `0m1.041s` respectively. (GCC v10 though). – Galik Mar 22 '21 at 09:32
  • Although my results are rather variable on a HDD rather than the SDD. I would have thought the majority of the time spent on this is Disk-IO rather than the C++ libraries. – Galik Mar 22 '21 at 09:43
  • Maybe you could try outputting to `/dev/null`? – Galik Mar 22 '21 at 09:45
  • With my setup (g++ 9.3.0 via WSL, SSD, compiled with -o3, each test run multiple times but seperately) the first an second version take rouglhy the _same time_ whereas the third one takes _half_ the time (so it's actually faster then printf on my platform). So, maybe your disk speed is screwing the meassurement? Or the implementation? Going to test this with Visual studio next. – churill Mar 22 '21 at 09:46
  • @Galik Thank you for the comments. I have added timing inside the program now in the tests and updated the question too to include internal timing data. Redirecting the output to `/dev/null` also produces similar timing results. For example, `time ./ex1 > /dev/null` consumed 1204 ms, `time ./ex2 > /dev/null` consumed 3368 ms and `time ./ex3 > /dev/null` consumed 3311 ms in one of the trial runs. – Lone Learner Mar 22 '21 at 09:57
  • @Blastfurnace Thanks for the comment! I had that in the original question and then forgot it after an update. I have added the `-O2` option back to the tests now and updated the results as well. – Lone Learner Mar 22 '21 at 10:07
  • 2
    @Galik @churill Confirmed that with `g++ 8.3.0` on Linux, I get similar results as yours. `ex1` takes about 1100 ms, `ex2` takes about 1300 ms and `ex3` takes about 1000 ms. So the issue in my question looks like an issue in `Apple clang version 12.0.0 (clang-1200.0.32.27)` that is on my macOS system. – Lone Learner Mar 22 '21 at 10:13

0 Answers0