30

I need to copy a file into a string. I need someway to preallocate memory for that string object and a way to directly read the file content into that string's memory?

Amro
  • 121,265
  • 25
  • 232
  • 431
Ramadheer Singh
  • 3,914
  • 4
  • 30
  • 43
  • 5
    possible duplicate of [Read whole ASCII file into C++ std::string](http://stackoverflow.com/questions/2602013/read-whole-ascii-file-into-c-stdstring) - My accepted answer to this question also explains how to pre-allocate all of the memory so that the string does not expand itself repeatedly during the read. – Tyler McHenry Jul 21 '10 at 21:05

7 Answers7

38

std::string has a .reserve method for pre-allocation.

std::string s;
s.reserve(1048576); // reserve 1 MB
read_file_into(s);
kennytm
  • 469,458
  • 94
  • 1,022
  • 977
16

This isn't so much an answer in itself, as a kind of a comment on/summary/comparison of a couple of other answers (as well as a quick demonstration of why I've recommended the style of code @Johannes - litb gives in his answer). Since @sbi posted an alternative that looked pretty good, and (especially) avoided the extra copy involved in reading into a stringstream, then using the .str() member to get a string, I decided to write up a quick comparison of the two:

[ Edit: I've added a third test case using @Tyler McHenry's istreambuf_iterator-based code, and added a line to print out the length of each string that was read to ensure that the optimizer didn't optimize away the reading because the result was never used.]

[ Edit2: And now, code from Martin York has been added as well...]

#include <fstream>
#include <sstream>
#include <string>
#include <iostream>
#include <iterator>
#include <time.h>

int main() {
    std::ostringstream os;
    std::ifstream file("equivs2.txt");

    clock_t start1 = clock();
    os << file.rdbuf();
    std::string s = os.str();
    clock_t stop1 = clock();

    std::cout << "\ns.length() = " << s.length();

    std::string s2;

    clock_t start2 = clock();
    file.seekg( 0, std::ios_base::end );
    const std::streampos pos = file.tellg();
    file.seekg(0, std::ios_base::beg);

    if( pos!=std::streampos(-1) )
        s2.reserve(static_cast<std::string::size_type>(pos));
    s2.assign(std::istream_iterator<char>(file), std::istream_iterator<char>());
    clock_t stop2 = clock();

    std::cout << "\ns2.length = " << s2.length();

    file.clear();

    std::string s3;

    clock_t start3 = clock();   
    file.seekg(0, std::ios::end);   
    s3.reserve(file.tellg());
    file.seekg(0, std::ios::beg);

    s3.assign((std::istreambuf_iterator<char>(file)),
            std::istreambuf_iterator<char>());
    clock_t stop3 = clock();

    std::cout << "\ns3.length = " << s3.length();

    // New Test
    std::string s4;

    clock_t start4 = clock();
    file.seekg(0, std::ios::end);
    s4.resize(file.tellg());
    file.seekg(0, std::ios::beg);

    file.read(&s4[0], s4.length());
    clock_t stop4 = clock();

    std::cout << "\ns4.length = " << s3.length();

    std::cout << "\nTime using rdbuf: " << stop1 - start1;
    std::cout << "\nTime using istream_iterator: " << stop2- start2;
    std::cout << "\nTime using istreambuf_iterator: " << stop3 - start3;
    std::cout << "\nTime using read: " << stop4 - start4;
    return 0;
}

Now the impressive part -- the results. First with VC++ (in case somebody cares, Martin's code is fast enough I increased the file size to get a meaningful time for it):

s.length() = 7669436
s2.length = 6390688
s3.length = 7669436
s4.length = 7669436
Time using rdbuf: 184
Time using istream_iterator: 1332
Time using istreambuf_iterator: 249
Time using read: 48

Then with gcc (cygwin):

s.length() = 8278035
s2.length = 6390689
s3.length = 8278035
s4.length = 8278035
Time using rdbuf: 62
Time using istream_iterator: 2199
Time using istreambuf_iterator: 156
Time using read: 16

[ end of edit -- the conclusions remain, though the winner has changed -- Martin's code is clearly the fastest. ]

The results are quite consistent with respect to which is fastest and slowest. The only inconsistency is with how much faster or slower one is than another. Though the placements are the same, the speed differences are much larger with gcc than with VC++.

Hossein
  • 3,839
  • 2
  • 20
  • 43
Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • Sort of what i thought initially: It's much easier to optimize the char-by-char read of op<< into a block read (or inline appropriate parts) than the char-by-char read of istream_iterator (though such code has to use `istreambuf_iterator` to avoid skipping whitespace for each character read - maybe that will speed things up since it's happening on a lower level?), which goes over multiple steps with op++, op* etc. But i didn't expect it would make *that* much of a difference. Thanks for timing it! – Johannes Schaub - litb Jul 21 '10 at 21:49
  • 1
    Could you write what compilation flags were used for test cases? – LookAheadAtYourTypes Feb 06 '16 at 15:06
6

This should be all you need:

ostringstream os;
ifstream file("name.txt");
os << file.rdbuf();

string s = os.str();

This reads characters from file and inserts them into the stringstream. Afterwards it gets the string created behind the scenes. Notice that i fell into the following trap: Using the extraction operator will skip initial whitespace. You have to use the insertion operator like above, or use the noskipws manipulator:

// Beware, skips initial whitespace!
file >> os.rdbuf();

// This does not skip it
file >> noskipws >> os.rdbuf(); 

These functions are described as reading the stream character by character though (not sure what optimizations are possible here, though), i haven't timed these to determine their speed.

Johannes Schaub - litb
  • 466,055
  • 116
  • 851
  • 1,175
  • 2
    this does copy twice, once to `ostringstream` buffer and second time to `s` – Ramadheer Singh Jul 21 '10 at 20:35
  • @Johannes, I was assuming the string memory to be contagious buffer, but after reading @GMan 's answer I realized that there is no way around the copying. – Ramadheer Singh Jul 21 '10 at 20:41
  • @Gollum: As I pointed out in an answer yesterday, I've used code like the above a lot without a problem, but if the extra copy causes a real problem, consider a previous answer Martin York pointed out yesterday: http://stackoverflow.com/questions/132358/how-to-read-file-content-into-istringstream/138645#138645. – Jerry Coffin Jul 21 '10 at 20:42
  • @Jerry, checked that answer, but that uses `vector` which again needs to be copied in to a `string` – Ramadheer Singh Jul 21 '10 at 20:45
  • @Gollum: yes, but given the (soon to be official) requirement that `string` use a contiguous buffer, you could resize the string and use its buffer instead of the `vector`. – Jerry Coffin Jul 21 '10 at 20:50
  • @Gollum: Do you really need a string, even? In any case, I would use this answer. If performance becomes a problem, then you can either: 1) replace your string usage with a vector, removing a copy or 2) Make an implementation assumption (a rather safe one) and read directly into the string, removing a copy. 3) [Copy char-by-char into the string from the file.](http://stackoverflow.com/questions/3303527/how-to-pre-allocate-memory-for-a-stdstring-object/3303764#3303764) – GManNickG Jul 21 '10 at 20:53
  • @Gollum: See [my answer](http://stackoverflow.com/questions/3303527/how-to-pre-allocate-memory-for-a-stdstring-object/3303764#3303764) for how to do away with the copy. – sbi Jul 21 '10 at 20:54
  • @Jerry: Interesting, Martin's answer is similar to [mine](http://stackoverflow.com/questions/3303527/how-to-pre-allocate-memory-for-a-stdstring-object/3303764#3303764). However, he read the data as binary, (not translating platform-specific newlines), while I read textually. Also I get away without the extra copy. – sbi Jul 21 '10 at 20:57
  • @Gollum my comment about block-reads was a bit too ambitious i guess :) Looks like even op<< has to read it char-by-char :( – Johannes Schaub - litb Jul 21 '10 at 21:01
  • @Johannes, although `ifstream.read (&my_str[0], length)` is a block read with assumption of string objects being in the contiguous block of memory, is it too dangerous an assuption? or pragmatic? – Ramadheer Singh Jul 21 '10 at 21:06
  • @Gollum: I'm sure in practice that's Ok. (There haven't been any non-contiguous string implementations around.) But the difference is that it performs binary read. – sbi Jul 21 '10 at 21:08
  • @sbi, right, that might be a good reason to not use it on formatted strings. – Ramadheer Singh Jul 21 '10 at 21:10
  • @Gollum herb sutter teaches his fellows that `&my_str[0]` points to a contiguous area of memory. He says at http://herbsutter.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/ : "However, current ISO C++ does require &str[0] to cough up a pointer to contiguous string data (but not necessarily null-terminated!), so there wasn’t much leeway for implementers to have non-contiguous strings, anyway.", although string's wording in current Std is broken in exactly that regard (refers to a non-const `data()` - but that doesn't exist!) - so it can't be relied on. – Johannes Schaub - litb Jul 21 '10 at 21:10
  • Like @sbi says i think in practice that's fine :) – Johannes Schaub - litb Jul 21 '10 at 21:11
5

Just for fun, here's another way to do this:

// Beware, brain-compiled code ahead!

std::ifstream ifs( /* ... */ );
if( !ifs.good() ) return; // whatever

std::string str;

ifs.seekg( 0, std::ios_base::end );
const std::streampos pos = ifs.tellg();
ifs.seekg( 0, std::ios_base::beg );
if( pos!=std::streampos(-1) ) // can get stream size? 
  str.reserve(static_cast<std::string::size_type>(pos));

str.assign( std::istream_iterator<char>(ifs)
          , std::istream_iterator<char>() );

I hope I didn't blow it too badly.

sbi
  • 204,536
  • 44
  • 236
  • 426
  • 2
    +1, was waiting for someone to elaborate stream iterator based code :) – bobah Jul 21 '10 at 20:56
  • +1, for flexible `Brain-Compiler` is okay with missing `}` ;) – Ramadheer Singh Jul 21 '10 at 21:02
  • @Gollum: I freely admit I copied those `seekg()` lines straight out of some code of mine (which fills a string with a file's content) and overlooked the `{`. I fixed it, but that's what that disclaimer is for, anyway. – sbi Jul 21 '10 at 21:05
2

std::string::reserve()

std::getline()

bobah
  • 16,722
  • 1
  • 31
  • 57
1

It seems that you are asking how to do a CString::GetBuffer, ReleaseBuffer type operation with std::string.

I don't know of any way to do this directly, an easy way would be to just create a raw C style buffer, read into the buffer, then copy the buffer to a std::string using assign or whatever. Of course you would have to worry about buffer overrun issues etc., also I would use a std::autoptr to manage the raw buffer pointer, to enusre deallocation on exception etc. This is a bit simpler than using stringstream etc. I can provide an example if needed.

Devin Ellingson

  • 1
    `auto_ptr` doesn't handle array types correctly. Just use `std::vector`. (Also, signing posts is frowned upon; your name is under everything you do.) – GManNickG Jul 22 '10 at 03:53
  • Thanks GMan, I forgot about that problem, auto_ptr always calls delete instead of delete [], which is whatwould be needed in this case. You could just create a simple array_auto_ptr class or as you say use std::vector. – DevinEllingson Jul 22 '10 at 18:58
1

std::string::resize() actually allocates the required space.

std::string::reserve() may not (it's a request).

Vi_real
  • 978
  • 1
  • 14
  • 25