Reading into a fixed-sized buffer in a loop
To my surprise, old-fashioned, almost c-like code seems to be the fastest with both clang and gcc:
{
vector<char> cin_str;
// 64k buffer seems sufficient
std::streamsize buffer_sz = 65536;
vector<char> buffer(buffer_sz);
cin_str.reserve(buffer_sz);
auto rdbuf = cin.rdbuf();
while (auto cnt_char = rdbuf->sgetn(buffer.data(), buffer_sz))
cin_str.insert(cin_str.end(), buffer.data(), buffer.data() + cnt_char);
}
Using istream::read()
and istream::gcount()
was as fast but required a little extra code...
c++ iterators
Surprisingly, using istreambuf_iterator
(iterator for unformatted input) turned out to be much, much slower: >3x for some test files, even after switching off sync with stdio.
{
std::ios_base::sync_with_stdio(false) ;
vector<char> cin_str;
// 64k
std::streamsize buffer_sz = 65536;
cin_str.reserve(buffer_sz);
std::istreambuf_iterator<char> iit (std::cin.rdbuf()); // stdin iterator
std::istreambuf_iterator<char> eos; // end-of-range iterator
std::copy(iit, eos, std::back_inserter(cin_str));
return cin_str;
}
This is true even after reserving space for the vector
buffer (rather than just assigning to it).
The other surprise is that a see (near) maximum speed even with a very modest buffer size (64 kb). vector
just has a very efficient reallocation strategy.
Addendum:
Google-ing finds this blog post
(http://insanecoding.blogspot.in/2011/11/reading-in-entire-file-at-once-in-c.html) from 2011 which seems to show that this approach is about as fast as you can go in c++ (in gcc/clang), and switching to cstdio does not provide further gains (but obviously makes the code even uglier!).
Avoiding copies
@BenVoigt points out that the read data can be placed in place by sgetn()
/ istream::read()
if we judiciously preallocate the requisite space:
{
std::ios_base::sync_with_stdio(false) ;
// 64k
std::streamsize buffer_sz = 65536;
vector<char> cin_str(buffer_sz);
std::streamsize cin_str_data_end = 0U;
auto rdbuf = cin.rdbuf();
while (auto cnt_char = rdbuf->sgetn(cin_str_data_end + cin_str.data(), buffer_sz))
{
cin_str_data_end += cnt_char;
cin_str.resize(cin_str_data_end + buffer_sz);
}
cin_str.resize(cin_str_data_end);
return cin_str;
}
In testing, this resulted in no further speedups probably because this code is dominated by 1) i/o 2) system call overhead 3) vector memory allocation
Is there a faster way to do this? Memory mapped files from boost?