1

I am trying to extract a string from an istream with strings as delimiters, yet i haven't found any string operations with behavior close to such as find() or substr() in istreams.

Here is an example istream content: delim_oneFUUBARdelim_two and my goal is to get FUUBAR into a string with as little workarounds as possible.

My current solution was to copy all istream content into a string using this solution for it and then extracting using string operations. Is there a way to avoid this unnecessary copying and only read as much from the istream as needed to preserve all content after the delimited string in case there are more to be found in similar fashion?

Community
  • 1
  • 1
nOvoid
  • 89
  • 1
  • 9

2 Answers2

1

You can easily create a type that will consume the expected separator or delimiter:

struct Text
{
    std::string t_;
};

std::istream& operator>>(std::istream& is, Text& t)
{
    is >> std::skipws;
    for (char c: t.t_)
    {
        if (is.peek() != c)
        {
            is.setstate(std::ios::failbit);
            break;
        }
        is.get(); // throw away known-matching char
    }
    return is;
}

See it in action on ideone

This suffices when the previous stream extraction naturally stops without consuming the delimiter (e.g. an int extraction followed by a delimiter that doesn't start with a digit), which will typically be the case unless the previous extraction is of a std::string. Single-character delimiters can be specified to getline, but say your delimiter is "</block>" and the stream contains "<black>metalic</black></block>42" - you'd want something to extract "<black>metallic</black>" into a string, throw away the "</block>" delimiter, and leave the "42" on the stream:

struct Until_Delim {
    Until_Delim(std::string& s, std::string delim) : s_(s), delim_(delim) { }
    std::string& s_;
    std::string delim_;
};

std::istream& operator>>(std::istream& is, const Until_Delim& ud)
{
    std::istream::sentry sentry(is);
    size_t in_delim = 0;
    for (char c = is.get(); is; c = is.get())
    {
        if (c == ud.delim_[in_delim])
        {
            if (++in_delim == ud.delim_.size())
                break;
            continue;
        }
        if (in_delim) // was part-way into delimiter match...
        {
            ud.s_.append(ud.delim_, 0, in_delim);
            in_delim = 0;
        }
        ud.s_ += c;
    }
    // may need to trim trailing whitespace...
    if (is.flags() & std::ios_base::skipws)
        while (!ud.s_.empty() && std::isspace(ud.s_.back()))
            ud.s_.pop_back();
    return is;
}

This can then be used as in:

string a_string;
if (some_stream >> Until_Delim(a_string, "</block>") >> whatevers_after)
    ...

This notation might seem a bit hackish, but there's precedent in Standard Library's std::quoted().

You can see the code running here.

Tony Delroy
  • 94,554
  • 11
  • 158
  • 229
  • That again only works with numbers or other input that is clearly distinguishable from the delimiter based on the corresponding type. – Florian May 06 '16 at 17:10
  • @Florian: good point - some code that'll cover the most common case where that matters added. Cheers. – Tony Delroy May 06 '16 at 19:46
  • 1
    +1 for your reactivity ;) I'm not really sure what the 1st line in your operator>> was doing, but http://en.cppreference.com/w/cpp/concept/FormattedInputFunction says that there should be a sentry object created. Does that make sense? – Florian May 06 '16 at 22:10
  • 1
    @Florian: the first line was skipping whitespace if the stream was set to do so, because get() doesn't do that for you, but you're right that a sentry can be used for that and does a few other useful things too. Thanks for pointing that out! Cheers. – Tony Delroy May 07 '16 at 04:37
0

Standard streams are equipped with locales that can do classification, namely the std::ctype<> facet. We can use this facet to ignore() characters in a stream while a certain classification is not present in the next available character. Here's a working example:

#include <iostream>
#include <sstream>

using mask = std::ctype_base::mask;

template<mask m>
void scan_classification(std::istream& is)
{
    auto& ctype = std::use_facet<std::ctype<char>>(is.getloc());

    while (is.peek() != std::char_traits<char>::eof() && !ctype.is(m, is.peek()))
        is.ignore();
}

int main()
{
    std::istringstream iss("some_string_delimiter3.1415another_string");
    double d;
    scan_classification<std::ctype_base::digit>(iss);

    if (iss >> d)
        std::cout << std::to_string(d); // "3.1415"
}
0x499602D2
  • 87,005
  • 36
  • 149
  • 233
  • 1
    you're describing how to find digits but that is not the only thing i am trying to extract, the type between the delimeters could be anything, i just want the string in between in a short and clean way – nOvoid Mar 27 '14 at 20:36
  • @nOvoid Maybe the regular expression library can help you out with that. -- http://en.cppreference.com/w/cpp/regex/basic_regex – 0x499602D2 Mar 27 '14 at 20:41
  • I was just searching for a way to do it more cleverly and preferrably without the use of additional libraries since i would expect such a common task to be done without libs. Maybe someone else knows? – nOvoid Mar 27 '14 at 23:47
  • 1
    @nOvoid Regular expressions are *completely standard* C++ (as of C++11). Other than you will have an unpleasant time parsing through strings. – 0x499602D2 Mar 27 '14 at 23:57