How to extract formatted text in C++?

Question

This might have appeared before, but I couldn't understand how to extract formatted data. Below is my code to extract all text between string "[87]" and "[90]" in a text file.

Apparently, the position of [87] and [90] is the same as indicated in the output.

void ExtractWebContent::filterContent(){
    string str, str1;
    string positionOfCurrency1 = "[87]";
    string positionOfCurrency2 = "[90]";
    size_t positionOfText1, positionOfText2;
    ifstream reading;
    reading.open("file_Currency.txt");
    while (!reading.eof()){ 
        getline (reading, str);

        positionOfText1 = str.find(positionOfCurrency1);
        positionOfText2 = str.find(positionOfCurrency2);
        cout << "positionOfCurrency1 " << positionOfText1 << endl;
        cout << "positionOfCurrency2 " << positionOfText2 << endl;

        //str1= str.substr (positionOfText);
        cout << "String" << str1 << endl;
    }

    reading.close();

An Update on the currency file:

[79]More »Brent slips to $102 on worries about euro zone economy

Market Data

 * Currencies

CAPTION: Currencies

      Name      Price    Change % Chg
   [80]USD/SGD
              1.2606     -0.00  -0.13%

                                       USD/SGD [81]USDSGD=X
   [82]EUR/SGD
              1.5242     0.00   +0.11%

                                       EUR/SGD [83]EURSGD=X

You might like [my older answer](http://stackoverflow.com/a/7584035/168175) which used Boost Format for the output — Flexo, Jul 24 '12 at 16:29
I wrote a very general answer that should send you in the right direction. If you can add the actual file format I can be more specific. — pmr, Jul 25 '12 at 00:01
I have updated the text file which content is to be extracted. It seems getline is a possible solution. — Bryan Wong, Jul 25 '12 at 08:15

score 2 · Answer 1 · edited May 23 '17 at 11:55

That really depends on what 'extracting data means'. In simple cases you can just read the file into a string and then use string member functions (especially find and substr) to extract the segment you are interested in. If you are interested in data per line getline is the way to go for line extraction. Apply find and substr as before to get the segment.

Sometimes a simple find wont get you far and you will need a regular expression to do easily get to the parts you are interested in.

Often simple parsers evolve and soon outgrow even regular expressions. This often signals time for the very large hammer of C++ parsing Boost.Spirit.

score 1 · Answer 2 · answered Jul 25 '12 at 00:06

1

Boost.Tokenizer can be helpful for parsing out a string, but it gets a little trickier if those delimiters have to be bracketed numbers like you have them. With the delimieters as described, a regex is probably adequate.

answered Jul 25 '12 at 00:06

Mike C

1,113
9
23

score 0 · Answer 3 · answered Jul 25 '12 at 00:01

All that does is concatenate the output of reading and the strings "[1]" and "[2]". I'm guessing this code resulted from a rather literal extrapolation of similar code using scanf. scanf (as well as the rest of C) still works in C++, so if that works for you I would use it.

That said, there are various levels of sophistication at which you can do this. Using regexes is one of the most powerful/flexible ways, but it might be overkill. The quickest way in my opinion is just to do something like:

Find index of substring "[1]", i1
Find index of substring "[2]", i2
get substring between i1+3 and i2.

In code, supposing std::string line has the text:

size_t i1 = line.find("[1]");
size_t i2 = line.find("[2]");
std::string out(line.substr(i1+3, i2));

Warning: no error checking.

Right, I have done it as shown above. But the returning position of 2 size_t is the same! How can we resolve this? Thanks! — Bryan Wong, Jul 25 '12 at 08:53

How to extract formatted text in C++?

3 Answers3