-2

Perhaps there is an answer to this somewhere on this site, but I can't find it for the life of me.

What I need is to get ALL the ASCII characters from a file in C++. This includes things like \n (ascii 10 in decimal), and the mysterious SUB (ascii 26 in decimal) - which seems to just act as an eof while I try to read in the file.

The issue is that I don't know of a method of reading in a file that isn't messed with when it sees things like new lines and whatnot.

I'd love to read all of these in to a vector of chars or uint8_ts.

I tried several approaches, including some found here: Read whole ASCII file into C++ std::string

No luck.

Community
  • 1
  • 1
MaxStrange
  • 127
  • 1
  • 10

2 Answers2

3

If SUB (0x1a) gave you problems, that is most likely because you opened the file in text mode, not binary mode, in Windows (*). Text-mode streams are allowed several implementation-defined things binary-mode streams are not, like changing the format of end-of-line (\n vs. \r\n), truncating trailing whitespace before a newline, or -- in your case -- considering 0x1a to mean end-of-file. So make sure that you use binary mode for reading binary data.

Note that binary-mode streams may have additional zero bytes at the end of the stream.

This kind of stream behaviour is specified for C99 in chapter 7.19.2 "Streams", section 2-3. I am sure similar specs are given for C11 and C++, but I cannot give you chapter and verse on those.

#include <iostream>
#include <fstream>
#include <sstream>

// ...

std::string filename( "foo.txt" );
std::stringstream sstr;

// It's the std::ios::binary that is making all the difference
std::ifstream in( filename.c_str(), std::ios::binary );

sstr << in.rdbuf();

Congratulations, you have just read the whole file into the stringstream sstr. You can get a string out of that with sstr.str() -- and a string has many of the same features as a std::vector< char > -- but djf's solution for directly reading into a vector<char> is more efficient (and would also work for a std::string by the way).


(*): Linux makes no difference between text and binary mode.


All that being said, there is no guarantee whatsoever that a file will actually contain ASCII. Going with the assumption that you are working on Windows, the default encoding for text files is CP1252, which is quite a different thing from either ASCII or ISO 8859-1 (Latin-1) or ISO 8859-15 (Latin-9). Welcome to the world of text encodings. My suggestion is to use UTF-8, it's the only sane choice...

DevSolar
  • 59,831
  • 18
  • 119
  • 197
  • +1 for "Linux makes no difference between text and binary mode". In Windows, `0x1A` is the Ctrl+Z character, which signals end-of-file (EOF) for text mode streams. – ubuntugod Feb 09 '16 at 09:41
  • @ubuntugod: Yea... not too long ago I was looking at some terribly convoluted code that jumped through various hoops to "encode" data so it didn't have any `0x1A` in it, and wondered, "why did the author do this?". When I eventually realized this was his "solution" for a problem he did not understand -- text vs. binary mode in Windows -- *and that I could not replace this mess with a `std::ios::binary` because the "encoding" had become part of the external ABI* -- I had to take a looong walk... – DevSolar Feb 09 '16 at 09:46
  • The whole `0x1A` came into the picture because of the legendary CP/M operating system. `0x1A` was used in it for marking the end of a file because, if a file had a content `Hello`, it would report that size of file in terms of **disk block** which was 128 bytes. So, `0x1A` was introduced for giving users a better way to understand the real size of the file. The EOF concept was carried over to MS-DOS as well. – ubuntugod Feb 09 '16 at 09:57
  • @ubuntugod: The long and short of it is, basically, "if it's binary, *open* it as binary, d'uh." ;-) – DevSolar Feb 09 '16 at 09:59
  • Yup. Completely forgot about text/binary distinction in Windows. Thanks! – MaxStrange Feb 09 '16 at 15:55
2

I agree with everything DevSolar said. I usually do something along the lines of:

#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>

using namespace std;

int main()
{
   ifstream  f("foo.txt", ios::in | ios::binary);
   vector<char> contents(istreambuf_iterator<char>(f), (istreambuf_iterator<char>()));

   // process contents ...

}
Community
  • 1
  • 1
djf
  • 6,230
  • 6
  • 38
  • 56
  • One up for the iterator constructor. I had completely forgotten about that one. If you actually *do* need the whole file in a vector, that's the way to go. – DevSolar Feb 09 '16 at 09:55