27

I was attempting to read a binary file byte by byte using an ifstream. I've used istream methods like get() before to read entire chunks of a binary file at once without a problem. But my current task lends itself to going byte by byte and relying on the buffering in the io-system to make it efficient. The problem is that I seemed to reach the end of the file several bytes sooner than I should. So I wrote the following test program:

#include <iostream>
#include <fstream>

int main() {
    typedef unsigned char uint8;
    std::ifstream source("test.dat", std::ios_base::binary);
    while (source) {
        std::ios::pos_type before = source.tellg();
        uint8 x;
        source >> x;
        std::ios::pos_type after = source.tellg();
        std::cout << before << ' ' << static_cast<int>(x) << ' '
                  << after << std::endl;
    }
    return 0;
}

This dumps the contents of test.dat, one byte per line, showing the file position before and after.

Sure enough, if my file happens to have the two-byte sequence 0x0D-0x0A (which corresponds to carriage return and line feed), those bytes are skipped.

  • I've opened the stream in binary mode. Shouldn't that prevent it from interpreting line separators?
  • Do extraction operators always use text mode?
  • What's the right way to read byte by byte from a binary istream?

MSVC++ 2008 on Windows.

Adrian McCarthy
  • 41,073
  • 12
  • 108
  • 157

5 Answers5

24

The >> extractors are for formatted input; they skip white space (by default). For single character unformatted input, you can use istream::get() (returns an int, either EOF if the read fails, or a value in the range [0,UCHAR_MAX]) or istream::get(char&) (puts the character read in the argument, returns something which converts to bool, true if the read succeeds, and false if it fails.

James Kanze
  • 142,482
  • 15
  • 169
  • 310
  • 10
    Wow, it boggles my mind that I can't read a byte from a binary file without a cast of some sort. – Adrian McCarthy Apr 01 '11 at 21:28
  • 1
    That's because streams are designed for text (even when opened in binary mode). Generally, when reading real binary date, I'll use the system level routines (open/read/write/close under Unix), rather than bother with iostream. – James Kanze Apr 04 '11 at 08:29
  • One can still use std::skipws so that streams skip white space(and other formatting) even when used with stream operators – Ghita Nov 14 '12 at 15:51
  • 2
    @Ghita I think you mean `std::noskipws`. – James Kanze Nov 14 '12 at 17:26
  • Right sorry you don't want to skip spaces in that case – Ghita Nov 14 '12 at 22:59
5

there is a read() member function in which you can specify the number of bytes.

stefaanv
  • 12,981
  • 2
  • 27
  • 48
4

Why are you using formatted extraction, rather than .read()?

Lightness Races in Orbit
  • 358,771
  • 68
  • 593
  • 989
4
source.get()

will give you a single byte. It is unformatted input function. operator>> is formatted input function that may imply skipping whitespace characters.

Serge Dundich
  • 3,529
  • 2
  • 17
  • 15
2

As others mentioned, you should use istream::read(). But, if you must use formatted extraction, consider std::noskipws.

Robᵩ
  • 143,876
  • 16
  • 205
  • 276