29

I did a sample project to read a file into a buffer. When I use the tellg() function it gives me a larger value than the read function is actually read from the file. I think that there is a bug.

here is my code:

EDIT:

void read_file (const char* name, int *size , char*& buffer)
{
  ifstream file;

  file.open(name,ios::in|ios::binary);
  *size = 0;
  if (file.is_open())
  {
    // get length of file
    file.seekg(0,std::ios_base::end);
    int length = *size = file.tellg();
    file.seekg(0,std::ios_base::beg);

    // allocate buffer in size of file
    buffer = new char[length];

    // read
    file.read(buffer,length);
    cout << file.gcount() << endl;
   }
   file.close();
}

main:

void main()
{
  int size = 0;
  char* buffer = NULL;
  read_file("File.txt",&size,buffer);

  for (int i = 0; i < size; i++)
    cout << buffer[i];
  cout << endl; 
}
JHobern
  • 858
  • 1
  • 12
  • 19
Elior
  • 2,948
  • 6
  • 33
  • 60
  • Is tellg() returning -1? Did you try opening the file in character mode? – Prabhu Apr 10 '14 at 10:11
  • tellg() returns a larger number. when i debug i see for example that i is equal to 60 and then the while loop is ending (means that we reached to eof) but tellg returns 65.. – Elior Apr 10 '14 at 10:16
  • ^opening file in text mode helps instead of ios::binary? – Prabhu Apr 10 '14 at 10:17
  • ^I have no idea. Just trying to help. BTW, is the difference consistent. Increase file size and still tellg() = file.gcount() + 5?? If so, possible that tellg() takes into account the file EOF characters too, and file.gcount() doesn't.. – Prabhu Apr 10 '14 at 10:25
  • I edited the file several times and all the time I get that tellg() returns a larger number than gcount().. maybe it really because gcount doesn't read EOF characters, but as i know, tellg() shouldn't read the EOF characters. right? – Elior Apr 10 '14 at 10:42
  • @Prabhu I set a new file which tellg() returns 633 as the result of file size, and when I summing the i variable with gcount it returns 602 and finish the while loop. i don't understand why.. – Elior Apr 10 '14 at 10:54
  • 2
    DUPLICATE of http://stackoverflow.com/questions/2641639/fstreams-tellg-seekg-returning-higher-value-than-expected – Sven Apr 10 '14 at 11:02
  • @Elior There is no such thing as an EOF character, at least inside C++. (On some systems, like Windows, there _is_ an EOF character in the file. If the first byte of the file is 0x1A, you will not be able to read any bytes from it, regardless of how big it is, at least in text mode.) – James Kanze Apr 10 '14 at 11:20
  • `ios::binary` is the correct mode. It is text mode in which `tellg` is unreliable. – M.M Apr 10 '14 at 12:15
  • @Sven, that link is one where the problem was text-mode instead of binary mode, but he is using binary mode here – M.M Apr 10 '14 at 12:18
  • Matt, you are right. I just copied the piece of code to my machine (ubuntu 12.04 gcc 4.6.3) and it works as expected. I have the same file.gcount() and length. Maybe this depends on the implementation. – Sven Apr 10 '14 at 12:56
  • Re. the updated code, OP, can you post the numbers you are getting? (There might be a clue...) and what does the OS say the file size is? – M.M Apr 10 '14 at 13:33
  • i checked the posted code and now it returns the right results. maybe there was something wrong in my previous posted code. anyway thanks for the help. @MattMcNabb why did you delete your answer? it was the right answer.. – Elior Apr 10 '14 at 13:41
  • My answer was that your code had a compilation error, so if that was your real code you wouldn't have got so far as running it... – M.M Apr 10 '14 at 13:45
  • yes but you told me to use read(buffer,length) instead the while loop :) – Elior Apr 10 '14 at 13:49

4 Answers4

71

tellg does not report the size of the file, nor the offset from the beginning in bytes. It reports a token value which can later be used to seek to the same place, and nothing more. (It's not even guaranteed that you can convert the type to an integral type.)

At least according to the language specification: in practice, on Unix systems, the value returned will be the offset in bytes from the beginning of the file, and under Windows, it will be the offset from the beginning of the file for files opened in binary mode. For Windows (and most non-Unix systems), in text mode, there is no direct and immediate mapping between what tellg returns and the number of bytes you must read to get to that position. Under Windows, all you can really count on is that the value will be no less than the number of bytes you have to read (and in most real cases, won't be too much greater, although it can be up to two times more).

If it is important to know exactly how many bytes you can read, the only way of reliably doing so is by reading. You should be able to do this with something like:

#include <limits>

file.ignore( std::numeric_limits<std::streamsize>::max() );
std::streamsize length = file.gcount();
file.clear();   //  Since ignore will have set eof.
file.seekg( 0, std::ios_base::beg );

Finally, two other remarks concerning your code:

First, the line:

*buffer = new char[length];

shouldn't compile: you have declared buffer to be a char*, so *buffer has type char, and is not a pointer. Given what you seem to be doing, you probably want to declare buffer as a char**. But a much better solution would be to declare it as a std::vector<char>& or a std::string&. (That way, you don't have to return the size as well, and you won't leak memory if there is an exception.)

Second, the loop condition at the end is wrong. If you really want to read one character at a time,

while ( file.get( buffer[i] ) ) {
    ++ i;
}

should do the trick. A better solution would probably be to read blocks of data:

while ( file.read( buffer + i, N ) || file.gcount() != 0 ) {
    i += file.gcount();
}

or even:

file.read( buffer, size );
size = file.gcount();

EDIT: I just noticed a third error: if you fail to open the file, you don't tell the caller. At the very least, you should set the size to 0 (but some sort of more precise error handling is probably better).

parsley72
  • 6,932
  • 8
  • 54
  • 78
James Kanze
  • 142,482
  • 15
  • 169
  • 310
  • thanks, buffer was declared as char** , it's a typo.. and those missing code you have noticed and mentioned as errors are exist in my code.. i just put a sample of code in my post, because i'm too lazy to press ctrl+c :) i also just read one character just to see if it works fine.. actually now i'm reading a block of data – Elior Apr 10 '14 at 11:48
  • 12
    `tellg()` returns a `streampos` object, and [here](http://www.cplusplus.com/reference/ios/streampos/) it states that «*Objects of this class support construction and conversion from int*», so at least the statement *"It's not even guaranteed that you can convert the type to an integral type"* doesn't seem to be truthful. – Fabio A. May 29 '16 at 12:35
  • If all you need is the file size there is now [std::filesystem::file_size](https://en.cppreference.com/w/cpp/filesystem/file_size), instead of `tellg` or `ftell`. – Zitrax Nov 03 '18 at 17:31
  • 1
    @FabioA. *`tellg()` returns a streampos object, and here it states that «Objects of this class support construction and conversion from int»,* That doesn't mean the integral value resulting from such a conversion is the size of the file. – Andrew Henle Jul 02 '19 at 13:29
  • 1
    @AndrewHenle, I stated that «the statement _"It's not even guaranteed that you can convert the type to an integral type"_ doesn't seem to be truthful», not that «the integral value resulting from such a conversion is the size of the file», as you say. – Fabio A. Jul 03 '19 at 14:19
  • @AndrewHenle that being said, [here](http://www.cplusplus.com/reference/ios/streampos/) it states that _"Objects of this class [...] allow consistent conversions to/from **streamoff** values"_ and [here](http://www.cplusplus.com/reference/ios/streamsize/) it states that `std::streamsize` is a _"type to represent sizes and character counts in streams"_ and that _"It is convertible to/from streamoff"_. Therefore, you can do `std::streampos` -> `std::streamoff` -> `std::streamsize`. – Fabio A. Jul 03 '19 at 14:24
  • "It's not even guaranteed that you can convert the type to an integral type." Wasted so much time cuz of this. Ended up finding that's not true. [Here](http://www.cplusplus.com/reference/istream/istream/tellg/) its mentioned that "it can be converted to/from integral types" – Mihir May 29 '20 at 16:54
  • 1
    @FabioA. First, [using cplusplus.com as a source? Seriously?!?](https://stackoverflow.com/questions/6520052/whats-wrong-with-cplusplus-com) Second, in none of those links does it state that the value from `tellg()` is **required by the C++ standard to be the number of bytes that can be read from a stream**. If that were true and `tellg()` was required to return a byte count, there would have been no need to create `std::filesystem file_size`. – Andrew Henle Feb 08 '21 at 14:33
  • @AndrewHenle yeah, seriously. If anything, I provided a source for my statement. I'd be very happy if you provided the source of yours. – Fabio A. Feb 08 '21 at 15:01
  • 1
    @FabioA. [C++14 27.9.1.1,p2](https://port70.net/~nsz/c/c%2B%2B/c%2B%2B14_n3936.txt): "The restrictions on reading and writing a sequence controlled by an object of class `basic_filebuf` are the same as for reading and writing with the Standard C library FILEs." – Andrew Henle Feb 08 '21 at 15:09
  • 1
    (cont) [C11, 7.21.9.4p2](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.4p2): " For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read." – Andrew Henle Feb 08 '21 at 15:09
  • 1
    (cont) [C11 7.21.9.2p3](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.2p3): "A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END." [C11, footnote 268](https://port70.net/~nsz/c/c11/n1570.html#note268): "Setting the file position indicator to end-of-file, as with `fseek(file, 0, SEEK_END)`, has undefined behavior for a binary stream ..." – Andrew Henle Feb 08 '21 at 15:12
  • 1
    (cont) So you can't use `tellg()` to get the byte count of a text stream, and you can't use `seekg()` to get to the end of a binary stream. I guess today is a good day for you - you learned something you didn't know before – Andrew Henle Feb 08 '21 at 15:13
  • @AndrewHenle thanks, now you'll learn something too. [N4296](https://isocpp.org/files/papers/n4296.pdf), §27.5.4.2 [fpos.operations]. [Picture for reference](https://imgur.com/a/DRm7w14): streampos -> streamoff -> streamsize. – Fabio A. Feb 09 '21 at 08:11
19

In C++17 there are std::filesystem file_size methods and functions, so that can streamline the whole task.

With those functions/methods there's a chance not to open a file, but read cached data (especially with the std::filesystem::directory_entry::file_size method)

Those functions also require only directory read permissions and not file read permission (as tellg() does)

fen
  • 8,812
  • 5
  • 29
  • 53
  • Not that these functions necessarily give the number of bytes you can read, either. At least the boost versions don't -- for the simple reason that that value isn't known, at least on some systems, until you actually read the bytes, and it depends on how you open the file (text or binary). The fact is that, at least on Windows (and doubtlessly on a lot of other systems as well), you cannot obtain the number of bytes you can read without actually reading them. – James Kanze Feb 09 '21 at 16:20
1
void read_file (int *size, char* name,char* buffer)
*buffer = new char[length];

These lines do look like a bug: you create an char array and save to buffer[0] char. Then you read a file to buffer, which is still uninitialized.

You need to pass buffer by pointer:

void read_file (int *size, char* name,char** buffer)
*buffer = new char[length];

Or by reference, which is the c++ way and is less error prone:

void read_file (int *size, char* name,char*& buffer)
buffer = new char[length];
...
Arks
  • 487
  • 4
  • 18
-1
fseek(fptr, 0L, SEEK_END);
filesz = ftell(fptr);

will do the file if file opened through fopen

using ifstream,

in.seekg(0,ifstream::end);
dilesz = in.tellg();

would do similar

Dr. Debasish Jana
  • 6,653
  • 3
  • 24
  • 55
  • 4
    On what systems? It will probably work under Unix (provided the file isn't too big), but it won't work on most other systems. – James Kanze Apr 10 '14 at 11:18