2

The problem is:

I have a code that operates on a fully functional istream. It uses methods like:

istream is;
is.seekg(...) // <--- going backwards at times
is.tellg()    // <--- to save the position before looking forward
etc.

These methods are only available for istreams from, say, a file. However, if I use cin in this fashion, it will not work--cin does not have the option of saving a position, reading forward, then returning to the saved position.

// So, I can't cat the file into the program
cat file | ./program

// I can only read the file from inside the program
./program -f input.txt

// Which is the problem with a very, very large zipped file
// ... that cannot coexist on the same raid-10 drive system 
// ... with the resulting output
zcat really_big_file.zip | ./program //<--- Doesn't work due to cin problem
./program -f really_big_file.zip //<--- not possible without unzipping

I can read cin into a deque, and process the deque. A 1mb deque buffer would be more than enough. However, this is problematic in three senses:

  1. I have to rewrite everything to do this with a deque
  2. It wont be as bulletproof as just using an istream, for which the code has already been debugged
  3. It seems like, if I implement it as a deque with some difficulty, someone is going to come along and say, why didn't you just do it like ___

What is the proper/most efficient way to create a usable istream object, in the sense that all members are active, with a cin istream?

(Bearing in mind that performance is important)

Chris
  • 21,703
  • 21
  • 59
  • 114
  • While you can't do random access on `std::cin`, you can always "seek" forward on `std::cin`, simply be reading from it (discarding what you read). Lets say you want to "seek" 1000 characters forward, then [read](http://en.cppreference.com/w/cpp/io/basic_istream/read) 1000 characters into a buffer that you then don't use. For larger files, read in chunks in a loop. – Some programmer dude Nov 11 '17 at 08:25
  • @Someprogrammerdude that is the problem--then I would have to operate on this chunk, which will be something other than an `istream`...and the current code actually seeks backward if it discovers something was wrong. So I would still have to rewrite. But if you are stumped, it sounds like I need to use the `deque`. – Chris Nov 11 '17 at 08:28
  • 1
    you may take a look at boost.iostream; more specifically, the [basic_array device](http://www.boost.org/doc/libs/1_65_1/libs/iostreams/doc/classes/array.html#array) offers a way to treat some contiguous array of chars as a seekable iostream ... – Massimiliano Janes Nov 11 '17 at 08:30
  • 1
    ... maybe used in conjuction with a back_insert_device and, say, a boost circular buffer – Massimiliano Janes Nov 11 '17 at 08:36
  • @MassimilianoJanes Ah, those sound like things that could work; boost is not installed on the machine in question yet. Do you think I can somehow adapt this answer to a `deque`, where, instead of a char array, I reference a `std::deque` in the membuf? https://stackoverflow.com/questions/7781898/get-an-istream-from-a-char If this is implemented, I think all I would need to do is to maintain a record of bytes read, and clear the front when I am done with a chunk.. – Chris Nov 11 '17 at 08:38
  • 1
    uhm, the proposed streambuf subclass implementation in that post looks incomplete to me ... but I can be wrong, that's why I'd use a ready made solution becasue having all iostream requirements right it's boring and hence error prone/time consuming ... note that that part of boost iostream is header only, so there should be no problem using it – Massimiliano Janes Nov 11 '17 at 08:50
  • @MassimilianoJanes ah, gotcha--thanks! (And yes, this is boring! :). – Chris Nov 11 '17 at 08:53
  • 1
    moreover, the author of the second answer in that post suggests, along with boost.iostream, boost interprocess streams. These give the best solution to your problem IMO, either via a bufferstream (that safely maps a c-array without copies) or a vectorstream(allows swapping a contiguous container, a very nice elegant solution ). These looks header-only as well. – Massimiliano Janes Nov 11 '17 at 09:17

2 Answers2

0

cin is user input and should be treated as unpredictable. If you want to use mentioned functionality and you are sure about your input you can read whole input to istringstream and then operate on it

Gregorrr
  • 31
  • 2
  • this seems like it would not work for files larger than ram, though--doesn't the stringstream require everything to be read into memory? – Chris Nov 11 '17 at 09:06
  • it requires. So you're trying to implement kind of real time input processing program? – Gregorrr Nov 11 '17 at 09:16
  • no, this is a typical use case. bear in mind, if you want to read the whole input to stringstream, you'll have to read it from `cin` – Chris Nov 11 '17 at 09:26
  • Apologize for my misunderstand. If you have a file why don't you read it in your program and use `fstream` which supports `seekg` etc. instead putting that file into `cin`? – Gregorrr Nov 11 '17 at 09:32
0

You could create a filtering stream buffer reading from std::cin when getting new data but buffering all received characters. You'd be able to implement seeking within the buffered range of the input. Seeking beyond the end of the already buffered input would imply reading corresponding amounts of data. Here is an example of a corresponding implementation:

#include <iostream>
#include <vector>

class bufferbuf
    : public std::streambuf {
private:
    std::streambuf*   d_sbuf;
    std::vector<char> d_buffer;

    int_type underflow() {
        char buffer[1024];
        std::streamsize size = this->d_sbuf->sgetn(buffer, sizeof(buffer));
        if (size == 0) {
            return std::char_traits<char>::eof();
        }
        this->d_buffer.insert(this->d_buffer.end(), buffer, buffer + size);
        this->setg(this->d_buffer.data(),
                   this->d_buffer.data() + this->d_buffer.size() - size,
                   this->d_buffer.data() + this->d_buffer.size());
        return std::char_traits<char>::to_int_type(*this->gptr());
    }
    pos_type seekoff(off_type off, std::ios_base::seekdir whence, std::ios_base::openmode) {
        switch (whence) {
        case std::ios_base::beg:
            this->setg(this->eback(), this->eback() + off, this->egptr());
            break;
        case std::ios_base::cur:
            this->setg(this->eback(), this->gptr() + off, this->egptr());
            break;
        case std::ios_base::end:
            this->setg(this->eback(), this->egptr() + off, this->egptr());
            break;
        default: return pos_type(off_type(-1)); break;
        }
        return pos_type(off_type(this->gptr() - this->eback()));
    }
    pos_type seekpos(pos_type pos, std::ios_base::openmode) {
        this->setg(this->eback(), this->eback() + pos, this->egptr());
        return pos_type(off_type(this->gptr() - this->eback()));
    }
public:
    bufferbuf(std::streambuf* sbuf)
        : d_sbuf(sbuf)
        , d_buffer() {
        this->setg(0, 0, 0); // actually the default setting
    }
};

int main() {
    bufferbuf      sbuf(std::cin.rdbuf());
    std::istream   in(&sbuf);
    std::streampos pos(in.tellg());

    std::string line;
    while (std::getline(in, line)) {
        std::cout << "pass1: '" << line << "'\n";
    }
    in.clear();
    in.seekg(pos);
    while (std::getline(in, line)) {
        std::cout << "pass2: '" << line << "'\n";
    }
}

This implementation buffers input before passing it on to the reading step. You can read individual characters (e.g. change char buffer[1024]; to become char buffer[1]; or replace the use of sgetn() appropriately using sbumpc()) to provide a more direct response: there is a trade-off between immediate response and performance for batch processing.

Dietmar Kühl
  • 141,209
  • 12
  • 196
  • 356
  • This looks very good, and is a great starting point for adjusting to my case. You right very good code, and of course it is much more than I was expecting—thank you! – Chris Dec 15 '17 at 01:55