How to extract lines from a large file between identifier/strings using C++?

Question

My .txt file is in something like this:

...
...
Start 75
23 55 00 00 00 48 00 05 -10 -53
21 45 03 00 00 00 12 06 -08 09
03 55 00 00 00 88 10 75 -10 -53
51 35 00 34 50 00 12 06 100 09
Start 76
33 55 00 50 00 48 80 05 -10 -93
61 15 33 00 30 00 12 00 -08 19
91 35 10 32 50 00 12 06 -30 09
Start 77
...
...

The identifier 'Start' is used to separate two bunches of data. The number of rows between identifiers are not always constant. Basically many rows of zeroes are rejected while taking data, so I cannot read a fixed number of lines between two occurrances of 'Start'.

For now, I use this code to read it line by line:

#include <fstream>
#include <iostream>
#include <string>
#include <stdio.h>
#include <vector>

using namespace std;

int main(int argc,char **argv) {

    string filename;
    ifstream file(filename.c_str());
    char word[BUFSIZ];
    char start, end;

    if (argc == 2) {
        filename = argv[1];}
    else { 
        cout<<"Usage: " << argv[0] << "filename" << endl;
        exit (-1);}

    if (file.is_open()) {

        string line;
        string numbers;
        vector<int> myNumbers;
        int count = 0;

        while(file.good()) {

            getline(file, line);

       //-------I am really lost here. 
       //-------I tried many approaches but the bunches seem to get mixed up

            //if (count < 2){
            //    if (line.find("Start") != string::npos){
            //        count++; }
            //     else {
            //        numbers.append(line);
            //        numbers.append("\n--");}
            //}
            //else {
            //    cout << endl << numbers << endl;
            //    count = 0;
            //    numbers.clear();}
        }
        file.close();}
}

I am trying to group the line at first occurrance of 'Start' (using a string for now to group together all the lines) until another occurrance of 'Start' is encountered. I will convert all those values in their order to a vector for storing next. I am not that good at C++, so stuck here and unsure about the implementation of this step, or if a better way exists. Also I must mention that the .txt file has around a million lines in total. I need help on how to group the rows in bunches between two 'Start' strings?

Unrelated to your problem, but please read [Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) Your `while(file.good())` is just another variant of that. — Some programmer dude, Jun 03 '19 at 13:54
As for your problem, just checking if the line starts with `"Start"` is a very good idea. If it doesn't then parse the numbers. For the number I recommend you learn about [`std::istringstream`](https://en.cppreference.com/w/cpp/io/basic_istringstream/basic_istringstream), [`std::istream_iterator`](https://en.cppreference.com/w/cpp/iterator/istream_iterator), and [the `std::vector` constructors](https://en.cppreference.com/w/cpp/container/vector/vector) (which will "automate" much for you). — Some programmer dude, Jun 03 '19 at 13:57
How large are your files? When they are not to large, and your not super focused on performance, you could think about using regular expressions. https://en.cppreference.com/w/cpp/regex — Felix Quehl, Jun 03 '19 at 14:01
@FelixQuehl ranging between 2 - 5 GB containing few million lines. I have truncated the lines to 10 int values (for convenience), there are usually a couple hundred of those. — rNov, Jun 03 '19 at 14:05
By the way, you have a bug where you first open the file and only then assign the filename to the variable. string filename; ifstream file(filename.c_str()); ... ... filename = argv[1]; — Lior, Apr 19 '20 at 06:07

How to extract lines from a large file between identifier/strings using C++?

0 Answers0