12

The idiomatic loop to read from an istream is

while (thestream >> value)
{
  // do something with value
}

Now this loop has one problem: It will not distinguish if the loop terminated due to end of file, or due to an error. For example, take the following test program:

#include <iostream>
#include <sstream>

void readbools(std::istream& is)
{
  bool b;
  while (is >> b)
  {
    std::cout << (b ? "T" : "F");
  }
  std::cout << " - " << is.good() << is.eof() << is.fail() << is.bad() << "\n";
}

void testread(std::string s)
{
  std::istringstream is(s);
  is >> std::boolalpha;
  readbools(is);
}

int main()
{
  testread("true false");
  testread("true false tr");
}

The first call to testread contains two valid bools, and therefore is not an error. The second call ends with a third, incomplete bool, and therefore is an error. Nevertheless, the behaviour of both is the same. In the first case, reading the boolean value fails because there is none, while in the second case it fails because it is incomplete, and in both cases EOF is hit. Indeed, the program above outputs twice the same line:

TF - 0110
TF - 0110

To solve this problem, I thought of the following solution:

while (thestream >> std::ws && !thestream.eof() && thestream >> value)
{
  // do something with value
}

The idea is to detect regular EOF before actually trying to extract the value. Because there might be whitespace at the end of the file (which would not be an error, but cause read of the last item to not hit EOF), I first discard any whitespace (which cannot fail) and then test for EOF. Only if I'm not at the end of file, I try to read the value.

For my example program, it indeed seems to work, and I get

TF - 0100
TF - 0110

So in the first case (correct input), fail() returns false.

Now my question: Is this solution guaranteed to work, or was I just (un-)lucky that it happened to give the desired result? Also: Is there a simpler (or, if my solution is wrong, a correct) way to get the desired result?

celtschk
  • 18,046
  • 2
  • 34
  • 61
  • what is the desired result? to check, also, if the file is valid? you have same result in both cases... – neagoegab Nov 11 '11 at 23:13
  • @neagoegab: The desired result is to detect whether the loop was terminated just due to reaching the end of file, or due to an erroneous entry. And at least in my experiment the results are *not* the same, see the third digit of the four-digit block: For the non-error case it reads 0100, for the error case it reads 0110. Since the third bit is the value of `fail()`, it means that at least for this test, `fail()` can distinguish between both cases. – celtschk Nov 11 '11 at 23:17
  • Then your answer is correct. Also please note that you are validating and processing the stream in the same time. If this is not a problem for you then it is fine... – neagoegab Nov 11 '11 at 23:27
  • @neagoegab: The idiomatic loop also validates and processes the stream at the same time. – celtschk Nov 12 '11 at 10:35
  • I think the idiomatic loop make the assumption that the data in stream is "valid"... whater valid means for the example application. Your second string is not a valid input. – neagoegab Nov 12 '11 at 10:54

1 Answers1

8

It is very easy to differentiate between EOF and other errors, as long as you don't configure the stream to use exceptions.

Simply check stream.eof() at the end.

Before that only check for failure/non-failure, e.g. stream.fail() or !stream. Note that good is not the opposite of fail. So in general never even look at the good, only at the fail.


Edit:

Some example code, namely your example modified to distinguish an ungood bool specification in the data:

#include <iostream>
#include <sstream>
#include <string>
#include <stdexcept>
using namespace std;

bool throwX( string const& s )  { throw runtime_error( s ); }
bool hopefully( bool v )        { return v; }

bool boolFrom( string const& s )
{
    istringstream stream( s );
    (stream >> boolalpha)
        || throwX( "boolFrom: failed to set boolalpha mode." );

    bool result;
    (stream >> result)
        || throwX( "boolFrom: failed to extract 'bool' value." );
        
    char c;  stream >> c;
    hopefully( stream.eof() )
        || throwX( "boolFrom: found extra characters at end." );
    
    return result;
}

void readbools( istream& is )
{
    string word;
    while( is >> word )
    {
        try
        {
            bool const b = boolFrom( word );
            cout << (b ? "T" : "F") << endl;
        }
        catch( exception const& x )
        {
            cerr << "!" << x.what() << endl;
        }
    }
    cout << "- " << is.good() << is.eof() << is.fail() << is.bad() << "\n";
}

void testread( string const& s )
{
    istringstream is( s );
    readbools( is );
}

int main()
{
  cout << string( 60, '-' ) << endl;
  testread( "true false" );

  cout << string( 60, '-' ) << endl;
  testread( "true false tr" );

  cout << string( 60, '-' ) << endl;
  testread( "true false truex" );
}

Example result:

------------------------------------------------------------
T
F
- 0110
------------------------------------------------------------
T
F
!boolFrom: failed to extract 'bool' value.
- 0110
------------------------------------------------------------
T
F
!boolFrom: found extra characters at end.
- 0110

Edit 2: in the posted code and results, added example of using eof() checking, which I forgot.


Edit 3: The following corresponding example uses the OP’s proposed skip-whitespace-before-reading solution:

#include <iostream>
#include <sstream>
#include <string>
using namespace std;

void readbools( istream& is )
{
    bool b;
    while( is >> ws && !is.eof() && is >> b )       // <- Proposed scheme.
    {
        cout << (b ? "T" : "F") << endl;
    }
    if( is.fail() )
    {
        cerr << "!readbools: failed to extract 'bool' value." << endl;
    }
    cout << "- " << is.good() << is.eof() << is.fail() << is.bad() << "\n";
}

void testread( string const& s )
{
    istringstream is( s );
    is >> boolalpha;
    readbools( is );
}

int main()
{
  cout << string( 60, '-' ) << endl;
  testread( "true false" );

  cout << string( 60, '-' ) << endl;
  testread( "true false tr" );

  cout << string( 60, '-' ) << endl;
  testread( "true false truex" );
}

Example result:

------------------------------------------------------------
T
F
- 0100
------------------------------------------------------------
T
F
!readbools: failed to extract 'bool' value.
- 0110
------------------------------------------------------------
T
F
T
!readbools: failed to extract 'bool' value.
- 0010

The main difference is that this approach produces 3 successfully read values in the third case, even though the third value is incorrectly specified (as "truex").

I.e. it fails to recognize an incorrect specification as such.

Of course, my ability to write Code That Does Not Work™ is no proof that it can not work. But I am fairly good at coding up things, and I could not see any way to detect the "truex" as incorrect, with this approach (while it was easy to do with the read-words exception based approach). So at least for me, the read-words exception based approach is simpler, in the sense that it is easy to make it behave correctly.

halfer
  • 18,701
  • 13
  • 79
  • 158
Cheers and hth. - Alf
  • 135,616
  • 15
  • 192
  • 304
  • That's not sufficient because EOF can be hit when parsing an erroneous entry. That's exactly what my example code demonstrates: Both the correct and the incorrect input have eof and fail set at the same time because the error was at the end of file. – celtschk Nov 12 '11 at 01:23
  • @AlfPSteinbach:Look again at the output of my example program: For both the correct and the erroneous string the final bit pattern was 0110, which means both `eof()` and `fail()` returned true. That's because all `eof()` tells you is that the end of file was reached, not whether the last read, which reached the end of file, was successful. – celtschk Nov 12 '11 at 01:31
  • @celtschk: it is sufficient. i added some code that might help you. – Cheers and hth. - Alf Nov 12 '11 at 01:40
  • @AlfPSteinbach: That's not simpler, that's vastly more complex. You don't directly read the bool, but read a word into a string, then try to concet the string into a bool using a separate stringstream. Yes, that works, but is is a quite complicated machanism, far from "just checking `eof()`". Moreover, it doesn't respect flags passed to the orig. stream (note that in my orig. code the actual reading fn. did *not* contain `boolalpha`, if called on a stream without it, your solution would fail (by accepting the then-malformed "true" and "false" while rejecting the then-wellformed "1" and "0"). – celtschk Nov 12 '11 at 08:55
  • 1
    @celtshk: re just checking eof(), in the code above that's in the `boolFrom` function. you better ask a new question about the new issues you raise. for example, you raise the issue of whether the parser function or the stream dictates the input format. nowhere was that issue touched on in your question. if you are interested, ask a new SO question. or if you have ideas for improving the iostreams, like you didn't quite like the answer given here, then you can post to [comp.std.c++]. – Cheers and hth. - Alf Nov 12 '11 at 09:20
  • But the `boolFrom` function is only part of the whole mechanism, and the whole mechanism is more complex than just checking eof. If you re-read my question at the end of my post, I asked whether my solution is guaranteed to work, and whether there's a *simpler* method to do the same. Your method is not simpler (that's my main point), and besides doesn't do the same (note that my solution has, except for not setting the fail flag on non-erroneous data, exactly the behaviour of the idiomatic loop). If my solution is guaranteed to work, there is no need to improve the iostream interface either. – celtschk Nov 12 '11 at 10:32
  • 1
    @ApfPSteinbach: So now with this addition, the your answer basically changed to: My version is *not* correct, and your solution is easier to get right (as opposed to the previous: Your solution is simpler). So that answer I can accept. Thanks. – celtschk Nov 14 '11 at 20:24