1

Using this code I'm read a string from file.

pbuf = infile.rdbuf();
size = pbuf->pubseekoff(0, ios::end, ios::in);
pbuf->pubseekpos (0,ios::in);
buf = new char[size];
pbuf->sgetn(buf, size);
str.assign(buf, buf+size);

I have to read data in temporary variable char* buff since sgetn needs a char* not a string.
So at this point before asking my actual question if anyone knows a better way of reading a string from a file that may contain white space character please tell(Not looping till eof).

The content of the file is:
blah blah blah
blah blah in a new line

But what I get is:
blah blah blah
blah blah in a new line═

Playing around with the code I noticed the number of strange characters increases, as I add more \n characters. It seems when I try to get size of file each \n character takes 2 bytes of space, but when in a string it only takes 1 byte, and thus my string looks strange. How do I avoid this?

atoMerz
  • 6,868
  • 14
  • 56
  • 99
  • 1
    [Read whole ASCII file into C++ std::string](http://stackoverflow.com/questions/2602013/read-whole-ascii-file-into-c-stdstring) – ipc Nov 29 '12 at 20:37
  • 1
    It's probably because of CRLF Windows style line endings, while strings are holding only CR's... – Andrejs Cainikovs Nov 29 '12 at 20:44
  • Since C++11 you can read directly into a `string` using `&s[0]`, and it will work in practice in some pre-C++11 implementations (like MSVC). – Yakov Galka Nov 29 '12 at 20:44
  • @ybungalobill is `s` a string? – atoMerz Nov 29 '12 at 20:48
  • @AtoMerZ: yes. Assuming you `resized` it properly of course... The point is that it is now guaranteed to be contiguous in memory and copy-on-write is prohibited. So now it is safe to do this way. **EDIT** Ah, also MSVC10 debug CRT introduced a bug that triggers an assertion when indexing into the past-the-end element of the string (although it is allowed by the standard). So it will work there only if `size() > 0`. – Yakov Galka Nov 29 '12 at 20:50
  • @ybungalobill It'd have been a nice answer if it didn't involve resizing. Since I have no way of correctly estimating the size. I can only guess which means reserving more than needed. – atoMerz Nov 29 '12 at 20:53
  • @AtoMerZ: You can always estimate as above and then resize again based on the value that `sgetn` returned (it returns the number of characters read). Although, in my opinion, one should always open files in binary mode and handle line endings on a higher level. – Yakov Galka Nov 29 '12 at 20:56

2 Answers2

2

On Windows, the representation of end-of-line in a text file is two bytes: 0x0d, 0x0a. When you use text mode to read from such a file, those two bytes get translated into the single character '\n'. When you use binary mode you're reading raw bytes, and they don't get translated for you. If you don't want them, you'll have to do the translation yourself.

Pete Becker
  • 69,019
  • 6
  • 64
  • 147
  • The OP does not use formatted input... The translation is done on a lower level based on the way you open the file (with or without ios_base::binary flag). – Yakov Galka Nov 29 '12 at 20:47
1

This is due to the standard library implementation turning the standard windows line ending \r\n into the standard c++ line ending \n.

As @ipc says, you can use this answer to do what you want. (Note: According to the comments, the accepted answer on that question is not actually the best way to do it.)

Alternatively, you can disable the line ending translation by opening the stream in binary mode, like so:

std::ifstream t(fileName, std::ios_base::in | std::ios_base::binary);
Community
  • 1
  • 1
Dan
  • 10,532
  • 2
  • 42
  • 74
  • Although this does solve the problem of odd characters, it still has one little problem. It still allocates space for extra characters, since `tellg()`, returns the number of bytes from start of the file. – atoMerz Nov 29 '12 at 20:58
  • Hahaha! This doesn't exactly solve the problem, but now that I read both characters there's no space wasted. – atoMerz Nov 29 '12 at 21:04
  • @AtoMerZ The important thing is knowing exactly how long the text is, for example if you need to insert a null terminator. – Dan Nov 29 '12 at 21:06