istream not fully recovering what has been put to stringstream

Question

I use the following set-up:

#include <bits/stdc++.h>
using namespace std;

class foo {
    public:
        void bar( istream &in, int n ) {
            vector<tuple<int,int,int,int>> q;
            int x,y,a,b;
            for ( q.clear(); in >> x >> y >> a >> b; q.push_back(make_tuple(x,y,a,b)) );
            assert( n == q.size() );
        }
};

int main() {
    stringstream ss;
    for ( int i= 0; i < 100; ++i )
        ss << rand() << " " << rand() << " " << rand() << " " << rand() << endl;
    ss.clear(), ss.seekg(0,std::ios::beg);
    (new foo())->bar(ss,100);
}

In fact, my code is more complex than this, but the idea is that I put stuff (long long ints to be exact) into a stringstream and call a function, supplying the created stringstream as istream object. The above example works fine, but in my particular case I put, say, 2 mln tuples. And the problem is that the numbers are not fully recovered at the other end, inside the foo (I get less than 2000000 numbers). Can you envision a scenario when this might happen? Can this in >> x >> y >> a >> b somehow end sooner than the input is exhausted?

EDIT: I have used this check:

if ( ss.rdstate() and std::stringstream::badbit ) {
    std::cerr << "Problem in putting stuff into stringstream!\n";
    assert( false );
}

Somehow, everything was passing this check.

EDIT: As I said, I do a sanity check inside main() by recovering the input-numbers using the >>-method, and indeed get back the 2 mln (tuples of) numbers. It is just when the stringstream object gets passed to the foo, it recovers only fraction of the numbers, not all of them.

EDIT: For what it's worth, I am pasting the actual context here. Because of its dependencies, it won't compile, but at least we will be able to see the offending lines. It is the run() method that is not being able to recover the queries supplied by the main() method.

#include <iostream>
#include <algorithm>
#include <chrono>

const unsigned long long PERIOD= 0x1full;

class ExpRunnerJSONOutput : public ExperimentRunner {
    std::string answers;
    void set_name( std::string x ) {
        this->answers= "answers."+x+".txt";
    }
public:
    ExpRunnerJSONOutput( query_processor *p ) : ExperimentRunner(p) {
        set_name(p->method_name);
    }

    ExperimentRunner *setProcessor( query_processor *p) override {
        ExperimentRunner::setProcessor(p);
        set_name(p->method_name);
        return this;
    }

    // in: the stream of queries
    // out: where to write the results to
    virtual void run( std::istream &in, std::ostream &out ) override {

        node_type x,y;
        value_type a,b;
        unsigned long long i,j,rep_period= (16383+1)*2-1;

        auto n= tree->size();

        std::vector<std::tuple<node_type,node_type,value_type,value_type>> queries;
        for ( queries.clear(); in >> x >> y >> a >> b; queries.push_back(std::make_tuple(x,y,a,b)) ) ;

        value_type *results= new value_type[queries.size()], *ptr= results;

        /* results are stored in JSON */
        nlohmann::json sel;

        long double total_elapsed_time= 0.00;
        std::chrono::time_point<std::chrono::high_resolution_clock,std::chrono::nanoseconds> start, finish;
        long long int nq= 0, it= 0;

        start= std::chrono::high_resolution_clock::now();
        int batch= 0;
        for ( auto qr: queries ) {
            x= std::get<0>(qr), y= std::get<1>(qr);
            a= std::get<2>(qr), b= std::get<3>(qr);
            auto ans= processor->count(x,y,a,b); nq+= ans, nq-= ans, ++nq, *ptr++= ans;
        }
        finish = std::chrono::high_resolution_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(finish-start);
        total_elapsed_time= elapsed.count();
        sel["avgtime_microsec"]= total_elapsed_time/nq*(1e-3);

        out << sel << std::endl;
        out.flush();

        delete[] results;

    }
    ~ExpRunnerJSONOutput() final {}
};

void runall( std::istream &in, char *res_file, ExpRunnerJSONOutput *er ) {
    in.clear(), in.seekg(0,std::ios::beg);
    std::string results_file= std::string(res_file);
    std::ofstream out;
    try {
        out.open(results_file,std::ios::app);
    }
    catch ( std::exception &e ) {
        throw e;
    }
    er->run(in,out), out.close();
}

using instant= std::chrono::time_point<std::chrono::steady_clock,std::chrono::nanoseconds>;

void sanity_check( std::istream &in, size_type nq ) {
    node_type x,y;
    value_type a,b;
    size_type r= 0;
    for ( ;in >> x >> y >> a >> b; ++r ) ;
    assert( r == nq );
}

int main( int argc, char **argv ) {
    if ( argc < 5 ) {
        fprintf(stderr,"usage: ./<this_executable_name> <dataset_name> <num_queries> <result_file> K");
        fflush(stderr);
        return 1;
    }
    query_processor *processor;
    std::string dataset_name= std::string(argv[1]);
    auto num_queries= std::strtol(argv[2],nullptr,10);
    auto K= std::strtol(argv[4],nullptr,10);
    std::ifstream in;
    std::ofstream logs;
    try {
        in.open(dataset_name+".puu");
        logs.open(dataset_name+".log");
    } catch ( std::exception &e ) {
        throw e;
    }
    std::string s; in >> s;
    std::vector<pq_types::value_type> w;
    w.clear();
    pq_types::value_type maxw= 0;
    for ( auto l= 0; l < s.size()/2; ++l ) {
        value_type entry;
        in >> entry;
        w.emplace_back(entry);
        maxw= std::max(maxw,entry);
    }
    in.close();

    const rlim_t kStackSize= s.size()*2;
    struct rlimit r1{};
    int result= getrlimit(RLIMIT_STACK,&r1);
    if ( result == 0 ) {
        if ( r1.rlim_cur < kStackSize ) {
            r1.rlim_cur= kStackSize;
            result= setrlimit(RLIMIT_STACK,&r1);
            if ( result != 0 ) {
                logs << "setrlimit returned result = " << result << std::endl;
                assert( false );
            }
        }
    }
    logs << "stack limit successfully set" << std::endl;

    instant start, finish;

    remove(argv[3]);

    auto sz= s.size()/2;
    random1d_interval_generator<> rig(0,sz-1), wrig(0,maxw);

    auto node_queries= rig(num_queries), weight_queries= wrig(num_queries,K);
    assert( node_queries.size() == num_queries );
    assert( weight_queries.size() == num_queries );
    std::stringstream ss;
    ss.clear(), ss.seekg(0,std::ios::beg);
    for ( int i= 0; i < num_queries; ++i )
        ss << node_queries[i].first << " " << node_queries[i].second << " " << weight_queries[i].first << " " << weight_queries[i].second << "\n";
    ss.clear(), ss.seekg(0,std::ios::beg);
    sanity_check(ss,num_queries);

    start = std::chrono::steady_clock::now();
    auto *er= new ExpRunnerJSONOutput(processor= new my_processor(s,w,dataset_name));
    finish = std::chrono::steady_clock::now();
    logit(logs,processor,start,finish);
    runall(ss,argv[3],er), delete processor;

    logs.close();

    return 0;
}

EDIT: I was wondering if this has to do with ifstream.eof() - end of file is reached before the real end Now, how to confirm the hypothesis -- that reading stops once we reach a byte with value 26?

EDIT: One more update. After reading things inside the foo, the rdstate() returned 4, fail() == 1 and eof() == 0. So, apparently end-of-file has not been reached.

I'm not able to reproduce the bug locally. Have you been able to produce the bug using the code you posted above? — Collin, Jun 12 '19 at 15:45
Be cautious of `#include ` and `using namespace std;` as either can cause interesting problems. Using them together increases the likelihood of things going wrong. — user4581301, Jun 12 '19 at 15:50
**Recommended reading:** [Why should I not #include ?](https://stackoverflow.com/q/31816095/560648) — Lightness Races in Orbit, Jun 12 '19 at 16:14
_"The above was the bare-bones example, and it does indeed work as expected"_ Then it's not a bare-bones example. Only post examples that actually reproduce the problem. Thanks. — Lightness Races in Orbit, Jun 12 '19 at 16:14
@LightnessRacesinOrbit: I totally agree, but in my case it is a project involving several files, and it is hard to disentangle to present as an MWE. Thanks. — Ilonpilaaja, Jun 12 '19 at 21:32
Yep it can be difficult but it's still something you have to do first. In fact, the more difficult it is, the further your problem description is from the actual cause, and the less likely we are to be able to guess at the solution. — Lightness Races in Orbit, Jun 12 '19 at 22:28

doctorlove · Answer 1 · 2019-06-12T17:52:27.270

1

You are not checking the state of your stream. There is an upper limit on how much you can fit in there - basically the max string size. This is discussed in detailed in this question

Check for errors as you write to the stringstream?

stringstream ss;
for (int i = 0; i < 100000000; ++i) //or some other massive number?
{
    ss << rand() << " " << rand() << " " << rand() << "  " << rand() << endl;
    if (ss.rdstate() & stringstream::badbit)
        std::cerr << "Problem!\n";
}

You may want to check specific writes of numbers.

edited Jun 12 '19 at 17:52

answered Jun 12 '19 at 16:08

doctorlove

17,477
2
41
57

1

That's the AND operator (`&&`), not the bitwise-AND (`&`). You should use the latter. Also, there's a function called `bad()` that does this check. – 0x499602D2 Jun 12 '19 at 16:19
Should use & instead of &&. – Tanveer Badar Jun 12 '19 at 17:54

score 0 · Answer 2 · answered Jun 13 '19 at 11:24

Ultimately, I've used good old FILE * instead of istreams, and everything worked as expected. For some reason, the latter was reading only a part of the file (namely, a prefix thereof), and stopping prematurely with a fail() being true. I have no idea why.

istream not fully recovering what has been put to stringstream

2 Answers2