1

I wrote simple function that read whole file into a buffer.

#include <iostream>
#include <fstream>
int main()
{
    std::ios_base::sync_with_stdio(0);
    std::ifstream t;
    t.open("C:\\Users\\sufal\\Desktop\\test.txt");
    t.seekg(0, std::ios::end);    
    long length = t.tellg();           
    t.seekg(0, std::ios::beg);  
    std::cout << "file size: " << length << std::endl;
    char* buffer = new char[length+1];    
    t.read(buffer, length);       
    t.close();
    buffer[length] = 0;
    std::cout << buffer << std::endl;

    
    return 0; 
}

And this is test.txt:

1
2
3

The output that the program produces looks like this: enter image description here

The file size should be 5 bytes. Why my program shows wrong file size? Windows Explorer also seems to show wrong file size of 7 bytes.

BartoszKP
  • 32,105
  • 13
  • 92
  • 123
olaf
  • 75
  • 7
  • 1
    This doesn’t address the question l but get in the habit of initializing objects with meaningful values rather than default initializing them and immediately overwriting the default values. In this case, that means changing `std::ifstream t; t.open("C:\\Users\\sufal\\Desktop\\test.txt”);` to `std::ifstream t("C:\\Users\\sufal\\Desktop\\test.txt");`. Also, you don’t have to call `t.close();`. The destructor will do that. – Pete Becker Nov 29 '20 at 23:05

3 Answers3

4

On Windows the newline character is "\r\n", which consists of two bytes. So, if your file does not end with a newline, 7 is indeed its size:

1     <-- 1 byte for '1', 2 bytes for CRLF
2     <-- 1 byte for '2', 2 bytes for CRLF
3     <-- 1 byte for '3'

To read the file correctly on a byte level you need to open it in binary mode:

t.open("C:\\Users\\sufal\\Desktop\\test.txt", ios_base::binary);

(you can read about the details of this behavior in the documentation).

You can also see other options to read the whole file into a string in C++:

BartoszKP
  • 32,105
  • 13
  • 92
  • 123
  • So binary mode is also applicable for reading text files? – olaf Nov 29 '20 at 23:00
  • 1
    @olaf No, but your code is written in a way that assumes reading a binary file - byte by byte. Without this flag, `ifstream` interprets the newline characters and modifies them, hence your artefacts. See the linked questions and their answers for ways to read the file taking advantage of it being a text file. – BartoszKP Nov 29 '20 at 23:01
2

Your file is 7 bytes in size, because it uses CRLF line breaks.

1[cr][lf]
2[cr][lf]
3

But, you are opening the file in text mode, which on Windows will normalize CRLF line breaks to LF. You are allocating 7 chars for your buffer, but read() is outputting only 5 chars:

1[lf]
2[lf]
3

That is why you see the extra 2 = on the end of the print output, because you didn’t zero out the unused buffer space, so you are seeing random garbage from uninitialized memory.

To do what you are attempting, open the file in binary mode instead.

t.open("C:\\Users\\sufal\\Desktop\\test.txt", std::ios_base::binary);

See Binary and text modes on cppreference.com for more details.

Remy Lebeau
  • 454,445
  • 28
  • 366
  • 620
1

On Windows this file is indeed 7 bytes: 1 \r\n 2 \r\n 3

Windows encodes new line in two bytes - CR + LF (or \r + \n in other notation).

All is correct.

loa_in_
  • 821
  • 5
  • 17
  • So if I just want to read whole file how I should handle this double EOL characters – olaf Nov 29 '20 at 22:52
  • 1
    You will read the file just fine. You can easily assume that all `\r` characters end the line and skip next byte. It's ALWAYS `\r\n` on Windows and `\r` isn't used anywhere else (basically). – loa_in_ Nov 29 '20 at 22:54
  • `\r` = 13 in decimal, `\n` = 10 decimal – loa_in_ Nov 29 '20 at 22:55
  • So that is the reason of this two equals signs at the end of output? – olaf Nov 29 '20 at 22:55
  • I'm not a C++ guy, sorry. For my taste the `+ 1` in `length + 1` is not necessary, but i take it from experience in different language. – loa_in_ Nov 29 '20 at 22:58
  • 1
    @loa_in_ `+ 1` is necessary to store the `0` at the end of the buffer, to have a correct null-terminated string, for `cout` to print it correctly. – BartoszKP Nov 29 '20 at 23:00
  • @BartoszKP only because the `buffer` is being printed as a null-terminated string. It is possible to print the `buffer` as-is without using a null terminator, by using `cout.write(buffer, length)` instead of `cout << buffer` – Remy Lebeau Nov 29 '20 at 23:13
  • @RemyLebeau Yes I know. I haven't been claiming it's the only possibility - just saying it makes sense. – BartoszKP Nov 30 '20 at 07:18