3

I currently do this, and the conversion to std::string at the end take 98% of the execution time. There must be a better way!

std::string
file2string(std::string filename)
{
    std::ifstream file(filename.c_str());
    if(!file.is_open()){
        // If they passed a bad file name, or one we have no read access to,
        // we pass back an empty string.
        return "";
    }
    // find out how much data there is
    file.seekg(0,std::ios::end);
    std::streampos length = file.tellg();
    file.seekg(0,std::ios::beg);
    // Get a vector that size and
    std::vector<char> buf(length);
    // Fill the buffer with the size
    file.read(&buf[0],length);
    file.close();
    // return buffer as string
    std::string s(buf.begin(),buf.end());
    return s;
}
jww
  • 83,594
  • 69
  • 338
  • 732
phorgan1
  • 1,494
  • 16
  • 15
  • 1
    why don't you just use a `char*` for reading and the `string(const char * s, size_t n)` constructor? – akappa Jan 05 '12 at 02:25
  • 2
    You might want to take a look at [mmap](http://linux.die.net/man/2/mmap) if you want really efficient access to a large file as a string (in the case of mmap a `char*`). – Aaron McDaid Jan 05 '12 at 02:30
  • 1
    possible duplicate of [Read whole ASCII file into C++ std::string](http://stackoverflow.com/questions/2602013/read-whole-ascii-file-into-c-stdstring) – Luc Touraille Jan 05 '12 at 08:38
  • I added yet another version, give it a shot in your benchmark :-) It's the same as the accepted answer in Luc's link. – Kerrek SB Jan 07 '12 at 02:53
  • Possible duplicate of [What is the best way to read an entire file into a std::string in C++?](https://stackoverflow.com/questions/116038/what-is-the-best-way-to-read-an-entire-file-into-a-stdstring-in-c) and [Read whole ASCII file into C++ std::string](https://stackoverflow.com/q/2602013/608639). – jww Dec 16 '17 at 04:31

4 Answers4

6

Being a big fan of C++ iterator abstraction and the algorithms, I would love the following to be the fasted way to read a file (or any other input stream) into a std::string (and then print the content):

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>

int main()
{
    std::string s(std::istreambuf_iterator<char>(std::ifstream("file")
                                                 >> std::skipws),
                  std::istreambuf_iterator<char>());
    std::cout << "file='" << s << "'\n";
}

This certainly is fast for my own implementation of IOStreams but it requires a lot of trickery to actually get it fast. Primarily, it requires optimizing algorithms to cope with segmented sequences: a stream can be seen as a sequence of input buffers. I'm not aware of any STL implementation consistently doing this optimization. The odd use of std::skipws is just to get reference to the just created stream: the std::istreambuf_iterator<char> expects a reference to which the temporary file stream wouldn't bind.

Since this probably isn't the fastest approach, I would be inclined to use std::getline() with a particular "newline" character, i.e. on which isn't in the file:

std::string s;
// optionally reserve space although I wouldn't be too fuzzed about the
// reallocations because the reads probably dominate the performances
std::getline(std::ifstream("file") >> std::skipws, s, 0);

This assumes that the file doesn't contain a null character. Any other character would do as well. Unfortunately, std::getline() takes a char_type as delimiting argument, rather than an int_type which is what the member std::istream::getline() takes for the delimiter: in this case you could use eof() for a character which never occurs (char_type, int_type, and eof() refer to the respective member of char_traits<char>). The member version, in turn, can't be used because you would need to know ahead of time how many characters are in the file.

BTW, I saw some attempts to use seeking to determine the size of the file. This is bound not to work too well. The problem is that the code conversion done in std::ifstream (well, actually in std::filebuf) can create a different number of characters than there are bytes in the file. Admittedly, this isn't the case when using the default C locale and it is possible to detect that this doesn't do any conversion. Otherwise the best bet for the stream would be to run over the file and determine the number of characters being produced. I actually think that this is what would be needed to be done when the code conversion could something interesting although I don't think it actually is done. However, none of the examples explicitly set up the C locale, using e.g. std::locale::global(std::locale("C"));. Even with this it is also necessary to open the file in std::ios_base::binary mode because otherwise end of line sequences may be replaced by a single character when reading. Admittedly, this would only make the result shorter, never longer.

The other approaches using the extraction from std::streambuf* (i.e. those involving rdbuf()) all require that the resulting content is copied at some point. Given that the file may actually be very large this may not be an option. Without the copy this could very well be the fastest approach, however. To avoid the copy, it would be possible to create a simple custom stream buffer which takes a reference to a std::string as constructor argument and directly appends to this std::string:

#include <fstream>
#include <iostream>
#include <string>

class custombuf:
    public std::streambuf
{
public:
    custombuf(std::string& target): target_(target) {
        this->setp(this->buffer_, this->buffer_ + bufsize - 1);
    }

private:
    std::string& target_;
    enum { bufsize = 8192 };
    char buffer_[bufsize];
    int overflow(int c) {
        if (!traits_type::eq_int_type(c, traits_type::eof()))
        {
            *this->pptr() = traits_type::to_char_type(c);
            this->pbump(1);
        }
        this->target_.append(this->pbase(), this->pptr() - this->pbase());
        this->setp(this->buffer_, this->buffer_ + bufsize - 1);
        return traits_type::not_eof(c);
    }
    int sync() { this->overflow(traits_type::eof()); return 0; }
};

int main()
{
    std::string s;
    custombuf   sbuf(s);
    if (std::ostream(&sbuf)
        << std::ifstream("readfile.cpp").rdbuf()
        << std::flush) {
        std::cout << "file='" << s << "'\n";
    }
    else {
        std::cout << "failed to read file\n";
    }
}

At least with a suitably chosen buffer I would expect the version to be the fairly fast. Which version is the fastest will certainly depend on the system, the standard C++ library being used, and probably a number of other factors, i.e. you want to measure the performance.

Dietmar Kühl
  • 141,209
  • 12
  • 196
  • 356
5

You can try this:

#include <fstream>
#include <sstream>
#include <string>

int main()
{
  std::ostringstream oss;
  std::string s;
  std::string filename = get_file_name();

  if (oss << std::ifstream(filename, std::ios::binary).rdbuf())
  {
    s = oss.str();
  }
  else
  {
    // error
  }

  // now s contains your file     
}

You can also just use oss.str() directly if you like; just make sure you have some sort of error check somewhere.

No guarantee that it's the most efficient; you probably can't beat <cstdio> and fread. As @Benjamin pointed out, the string stream only exposes the data by copy, so you could instead read directly into the target string:

#include <string>
#include <cstdio>

std::FILE * fp = std::fopen("file.bin", "rb");
std::fseek(fp, 0L, SEEK_END);
unsigned int fsize = std::ftell(fp);
std::rewind(fp);

std::string s(fsize, 0);
if (fsize != std::fread(static_cast<void*>(&s[0]), 1, fsize, fp))
{
   // error
}

std::fclose(fp);

(You might like to use a RAII wrapper for the FILE*.)


Edit: The fstream-analogue of the second version goes like this:

#include <string>
#include <fstream>

std::ifstream infile("file.bin", std::ios::binary);
infile.seekg(0, std::ios::end);
unsigned int fsize = infile.tellg();
infile.seekg(0, std::ios::beg);

std::string s(fsize, 0);

if (!infile.read(&s[0], fsize))
{
   // error
}

Edit: Yet another version, using streambuf-iterators:

std::ifstream thefile(filename, std::ios::binary);
std::string s((std::istreambuf_iterator<char>(thefile)), std::istreambuf_iterator<char>());

(Mind the aditional parentheses to get the correct parsing.)

Community
  • 1
  • 1
Kerrek SB
  • 428,875
  • 83
  • 813
  • 1,025
  • I'm pretty sure that move doesn't buy you anything. `ostringstream::str()` returns by value. – Benjamin Lindley Jan 05 '12 at 02:37
  • @BenjaminLindley: oh, good point. Never mind then, just use `oss.str()` directly. – Kerrek SB Jan 05 '12 at 02:38
  • I made a framework to call each of these as a function 1000 time to read a 1.3M jpeg. The Kerrek's first took 19s, the second 6s. Mine took 14s and David's took 2m21s. Can C++ do file I/O efficiently with elements of the standard template library? – phorgan1 Jan 05 '12 at 06:02
  • @Patrick: If it's any consolation, the C library is *part* of the C++ standard library, so don't be ashamed to use ``. But do let me post another version using ``. Stay tuned. – Kerrek SB Jan 05 '12 at 13:50
1

I don't know how efficient it is, but here is a simple (to read) way, by just setting the EOF as the delimiter:

string buffer;

ifstream fin;
fin.open("filename.txt");

if(fin.is_open()) {
    getline(fin,buffer,'\x1A');

fin.close();
}

The efficiency of this obviously depends on what's going on internally in the getline algorithm, so you could take a look at the code in the standard libraries to see how it works.

derpface
  • 1,473
  • 9
  • 19
1

Ironically, the example for string::reserve is reading a file into a string. You don't want to read the file into one buffer and then have to allocate/copy into another one.

Here's the example code:

// string::reserve
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main ()
{
  string str;
  size_t filesize;

  ifstream file ("test.txt",ios::in|ios::ate);
  filesize=file.tellg();

  str.reserve(filesize); // allocate space in the string

  file.seekg(0);
  for (char c; file.get(c); )
  {
    str += c;
  }
  cout << str;
  return 0;
}
Benjamin Lindley
  • 95,516
  • 8
  • 172
  • 256
David Schwartz
  • 166,415
  • 16
  • 184
  • 259
  • I agree. I'm not sure why they chose to do it that way. The important point is the `str.reserve` to make only a single allocation and then reading into the string. – David Schwartz Jan 05 '12 at 02:31
  • Having a correct example is important too, no? Hope you don't mind the edit. – Benjamin Lindley Jan 05 '12 at 02:57
  • I made a framework to call each of these as a function 1000 time to read a 1.3M jpeg. The Kerrek's first took 19s, the second 6s. Mine took 14s and David's took 2m21s. Can C++ do file I/O efficiently with elements of the standard template library? – phorgan1 Jan 05 '12 at 06:03