0

new to boost, I actually need boost spirit to write a simple parser to fill some data structure.

Here are roughly what they look like:

struct Task
{
    const string dataname;
    const Level level;
    const string aggregator;
    const set<string> groupby;
    void operator();
};


struct Schedule
{
    map<Level, ComputeTask> tasks;
    // I have left just to make it seems that 
    // the struct wrapping over the map is not
    // useless (this is not the full code)
    void operator()(const InstancePtr &node); 
};

Regarding Task, I don't know how I could use BOOST_FUSION_ADAPT_STRUCT, as mentioned in the employee example, or a variant, to make it work with enum and STL container fields.

Similar question for Schedule, but this time I am also using a user type (already registered to fusion maybe, is it recursive?).

I am designing the file format, the struct definitions and file formats may change so I prefer using boost instead of hand-crafted but hard to maintain code. I also do this for a learning purpose.

Here what the file could look like:

level: level operation name on(data1, data2, data3)
level: level operation name on()
level: level operation name on(data1, data2)

A line of is an entry of the map in Schedule, preceding the : is the key and then the rest of it defines the Task. Where level are replaced by some level keywords corresponding to the enum Level, similar case for operation, name is one of the allowed name (in a set of keywords), on() is a keyword and inside the parenthesis are zero or more strings provided by the user that should fill the set<string> groupby field in a Task.

I want it to be readable and I could even add english keywords which does not add anything else than readability, that is another reason to use some parsing library instead of handcrafted code.

Feel free to ask for more details if you think my question is not clear enough..

Thank you.

Nick Skywalker
  • 673
  • 6
  • 18
  • Why Boost and evene Spirit for such a simple task. What is the structure of the input file? Can you please show? Maybe it can be done with a one liner . . . – Armin Montigny Jun 28 '19 at 11:37
  • without information it's unclear what problem you run into. So, maybe include information and code. (One thing is sure though, it won't be a oneliner) – sehe Jun 28 '19 at 11:42
  • Sorry, I forgot to add an example. – Nick Skywalker Jun 28 '19 at 13:05
  • The structure of the input file can be created by a Chomsky-Type-3 regular language. I can be handled with a DFA Deterministic Finite Automaton. A shift/reduce parser or similar is not necessary. Althoug the on part can be represented by a CFG, tokenizing can be done with std::regex or simpler. But I appreciate the wish to learn about parsing. – Armin Montigny Jun 28 '19 at 17:44
  • Can you give a _real life_ example? Your code and input seem to be deliberately made void of meaning ("level operation name"? I cannot even sense whether those are placeholders or literals) (Maybe supply a matching AST - so we know what ends up in what member of what struct) – sehe Jun 28 '19 at 20:02

2 Answers2

1

So, making some assumptions as your examples don't make the meaning very clear. But here goes:

Going with a random enum:

enum class Level { One, Two, Three, LEVEL };

Sidenote: the std::set<> might need to be a sequential container, because usually groupby operations are not commutative (the order matters). I don't know about your domain, of course,

Adapting:

BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)

Note that I subtly put the adapted fields in the grammar order. That helps a lot down the road.

The simplest grammar that comes to mind:

template <typename It>
struct Parser : qi::grammar<It, Schedule()> {
    Parser() : Parser::base_type(_start) {
        using namespace qi;

        _any_word    = lexeme [ +char_("a-zA-Z0-9-_./") ];
        _operation   = _any_word; // TODO
        _group_field = _any_word; // TODO
        _dataname    = _any_word; // TODO

        _level       = no_case [ _level_sym ];
        _groupby     = '(' >> -(_group_field % ',') >> ')';
        _task        = _level >> _operation >> _dataname >> "on" >> _groupby;
        _entry       = _level >> ':' >> _task;
        _schedule    = _entry % eol;
        _start       = skip(blank) [ _schedule ];

        BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
    }
  private:
    struct level_sym : qi::symbols<char, Level> {
        level_sym() { this->add
            ("one", Level::One)
            ("two", Level::Two)
            ("three", Level::Three)
            ("level", Level::LEVEL);
        }
    } _level_sym;

    // lexemes
    qi::rule<It, std::string()> _any_word;
    qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
    qi::rule<It, Level()> _level;

    using Skipper = qi::blank_type;
    using Table = decltype(Schedule::tasks);
    using Entry = std::pair<Level, ComputeTask>;

    qi::rule<It, std::set<std::string>(), Skipper> _groupby;
    qi::rule<It, ComputeTask(), Skipper> _task;
    qi::rule<It, Entry(), Skipper> _entry;
    qi::rule<It, Table(), Skipper> _schedule;
    qi::rule<It, Schedule()> _start;
};

I changed the input to have unique keys for Level in the schedule, otherwise only one entry would actually result.

int main() {
    Parser<std::string::const_iterator> const parser;

    for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
TWO: level operation name on()
THREE: level operation name on(data1, data2))" })
    {
        auto f = begin(input), l = end(input);
        Schedule s;
        if (parse(f, l, parser, s)) {
            std::cout << "Parsed\n";
            for (auto& [level, task] : s.tasks) {
                std::cout << level << ": " << task << "\n";
            }
        } else {
            std::cout << "Failed\n";
        }

        if (f != l) {
            std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
        }
    }
}

Prints

Parsed
One: LEVEL operation name on (data1, data2, data3)
Two: LEVEL operation name on ()
Three: LEVEL operation name on (data1, data2)

And, additonally with BOOST_SPIRIT_DEBUG defined:

<_start>
  <try>ONE: level operation</try>
  <_schedule>
    <try>ONE: level operation</try>
    <_level>
      <try>ONE: level operation</try>
      <success>: level operation na</success>
      <attributes>[One]</attributes>
    </_level>
    <_task>
      <try> level operation nam</try>
      <_level>
        <try>level operation name</try>
        <success> operation name on(d</success>
        <attributes>[LEVEL]</attributes>
      </_level>
      <_operation>
        <try>operation name on(da</try>
        <success> name on(data1, data</success>
        <attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
      </_operation>
      <_dataname>
        <try>name on(data1, data2</try>
        <success> on(data1, data2, da</success>
        <attributes>[[n, a, m, e]]</attributes>
      </_dataname>
      <_groupby>
        <try>(data1, data2, data3</try>
        <_group_field>
          <try>data1, data2, data3)</try>
          <success>, data2, data3)\nTWO:</success>
          <attributes>[[d, a, t, a, 1]]</attributes>
        </_group_field>
        <_group_field>
          <try>data2, data3)\nTWO: l</try>
          <success>, data3)\nTWO: level </success>
          <attributes>[[d, a, t, a, 2]]</attributes>
        </_group_field>
        <_group_field>
          <try>data3)\nTWO: level op</try>
          <success>)\nTWO: level operati</success>
          <attributes>[[d, a, t, a, 3]]</attributes>
        </_group_field>
        <success>\nTWO: level operatio</success>
        <attributes>[[[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]</attributes>
      </_groupby>
      <success>\nTWO: level operatio</success>
      <attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]]</attributes>
    </_task>
    <_level>
      <try>TWO: level operation</try>
      <success>: level operation na</success>
      <attributes>[Two]</attributes>
    </_level>
    <_task>
      <try> level operation nam</try>
      <_level>
        <try>level operation name</try>
        <success> operation name on()</success>
        <attributes>[LEVEL]</attributes>
      </_level>
      <_operation>
        <try>operation name on()\n</try>
        <success> name on()\nTHREE: le</success>
        <attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
      </_operation>
      <_dataname>
        <try>name on()\nTHREE: lev</try>
        <success> on()\nTHREE: level o</success>
        <attributes>[[n, a, m, e]]</attributes>
      </_dataname>
      <_groupby>
        <try>()\nTHREE: level oper</try>
        <_group_field>
          <try>)\nTHREE: level opera</try>
          <fail/>
        </_group_field>
        <success>\nTHREE: level operat</success>
        <attributes>[[]]</attributes>
      </_groupby>
      <success>\nTHREE: level operat</success>
      <attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]]</attributes>
    </_task>
    <_level>
      <try>THREE: level operati</try>
      <success>: level operation na</success>
      <attributes>[Three]</attributes>
    </_level>
    <_task>
      <try> level operation nam</try>
      <_level>
        <try>level operation name</try>
        <success> operation name on(d</success>
        <attributes>[LEVEL]</attributes>
      </_level>
      <_operation>
        <try>operation name on(da</try>
        <success> name on(data1, data</success>
        <attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
      </_operation>
      <_dataname>
        <try>name on(data1, data2</try>
        <success> on(data1, data2)</success>
        <attributes>[[n, a, m, e]]</attributes>
      </_dataname>
      <_groupby>
        <try>(data1, data2)</try>
        <_group_field>
          <try>data1, data2)</try>
          <success>, data2)</success>
          <attributes>[[d, a, t, a, 1]]</attributes>
        </_group_field>
        <_group_field>
          <try>data2)</try>
          <success>)</success>
          <attributes>[[d, a, t, a, 2]]</attributes>
        </_group_field>
        <success></success>
        <attributes>[[[d, a, t, a, 1], [d, a, t, a, 2]]]</attributes>
      </_groupby>
      <success></success>
      <attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]</attributes>
    </_task>
    <success></success>
    <attributes>[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]</attributes>
  </_schedule>
  <success></success>
  <attributes>[[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]]</attributes>
</_start>

Full Listing

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <vector>
#include <map>
#include <set>
#include <iostream>
#include <iomanip>
#include <experimental/iterator>

enum class Level { One, Two, Three, LEVEL };

struct ComputeTask {
    std::string dataname;
    Level level;
    std::string aggregator;
    std::set<std::string> groupby;
};

struct Schedule {
    std::map<Level, ComputeTask> tasks;
};

//////////////////////
// FOR DEBUG DEMO ONLY
static inline std::ostream& operator<<(std::ostream& os, Level l) {
    switch(l) {
        case Level::One: return os << "One";
        case Level::Two: return os << "Two";
        case Level::Three: return os << "Three";
        case Level::LEVEL: return os << "LEVEL";
    }
    return os << "?";
}

static inline std::ostream& operator<<(std::ostream& os, ComputeTask const& task) {
    os << task.level << ' ' << task.aggregator << ' ' << task.dataname << " on (";
    copy(begin(task.groupby), end(task.groupby), std::experimental::make_ostream_joiner(os, ", "));
    return os << ')';
}

/////////////
// FOR PARSER
BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)

namespace qi = boost::spirit::qi;

template <typename It>
struct Parser : qi::grammar<It, Schedule()> {
    Parser() : Parser::base_type(_start) {
        using namespace qi;

        _any_word    = lexeme [ +char_("a-zA-Z0-9-_./") ];
        _operation   = _any_word; // TODO
        _group_field = _any_word; // TODO
        _dataname    = _any_word; // TODO

        _level       = no_case [ _level_sym ];
        _groupby     = '(' >> -(_group_field % ',') >> ')';
        _task        = _level >> _operation >> _dataname >> "on" >> _groupby;
        _entry       = _level >> ':' >> _task;
        _schedule    = _entry % eol;
        _start       = skip(blank) [ _schedule ];

        BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
    }
  private:
    struct level_sym : qi::symbols<char, Level> {
        level_sym() { this->add
            ("one", Level::One)
            ("two", Level::Two)
            ("three", Level::Three)
            ("level", Level::LEVEL);
        }
    } _level_sym;

    // lexemes
    qi::rule<It, std::string()> _any_word;
    qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
    qi::rule<It, Level()> _level;

    using Skipper = qi::blank_type;
    using Table = decltype(Schedule::tasks);
    using Entry = std::pair<Level, ComputeTask>;

    qi::rule<It, std::set<std::string>(), Skipper> _groupby;
    qi::rule<It, ComputeTask(), Skipper> _task;
    qi::rule<It, Entry(), Skipper> _entry;
    qi::rule<It, Table(), Skipper> _schedule;
    qi::rule<It, Schedule()> _start;
};

int main() {
    Parser<std::string::const_iterator> const parser;

    for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
TWO: level operation name on()
THREE: level operation name on(data1, data2))" })
    {
        auto f = begin(input), l = end(input);
        Schedule s;
        if (parse(f, l, parser, s)) {
            std::cout << "Parsed\n";
            for (auto& [level, task] : s.tasks) {
                std::cout << level << ": " << task << "\n";
            }
        } else {
            std::cout << "Failed\n";
        }

        if (f != l) {
            std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
        }
    }
}
sehe
  • 328,274
  • 43
  • 416
  • 565
1

I would recomend the solution from user @sehe. This is very flexible.

But I would also like to share the pure C++ solution. As I have already written in my comment above, your inpput language is rather simple. You could even read the first elements with the standard extractor operator. The rest can be read in a loop with the std::istream:iterator.

You can also take C++ std::regex to validate the input. Because your langugae is a Chomsky-Type-3 regular language, this is easily possible. And if the input string is valid, you can use std::regex elements and std::regex_token_iterator, to get the data.

I created an example for you. The data is packed in a struct. For this struct I have overwritten the inserter and extractor operator. So easy input and output is possible using std::iostream functions.

In main I have a one-liner for reading the complete input file and put the data in a vector. so, I define the variable with constructor arguments. Thats's it. All data will be available as desired. For debug purposes, I print the result on the screen.

And as an exercise, I put the data in a map.

#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <iterator>
#include <regex>
#include <sstream>


std::istringstream testData(
R"#(level1: levelA operation0 name0 on(data10, data12, data13)
level2: levelB operation1 name1 on(  data1  )
level3: levelC operation2 name2 on()
level4: levelD operation3 name3 on(data2, data3)
level5: levelE operation4 name4 on(data4, data5, data6, data7)
level6: levelF operation5 name5 on(data8, data9)
)#");


const std::regex InputFileRegEx(R"#((\w+)(?:[\:\s]+)(\w+)(?:\s+)(\w+)(?:\s+)(\w+)(?:\s+)(?:on\s*\()(.*)(?:\)))#");

struct Data
{   // Our Data
    std::string levelLeft{};            // Left Element for Map
    struct Right{                       // Right element for Map. Sub Struct
        std::string levelRight{};
        std::string operation{};
        std::string name{};
        std::vector<std::string> data;  // The data in the on( section
    } r;

    // Overload the extractor operator. With that someting like "Data d;std::cin >> d; " is easiliy possible
    friend std::istream& operator >> (std::istream& is, Data& d) {
        std::string line; getline(is, line);                // Read a complete line
        std::smatch sm{};                                   // Prepare match result values
        if (std::regex_match(line, sm, InputFileRegEx)) {   // CHeck, if the input string is valid
            // Copy all data
            d.levelLeft = sm[1]; d.r.levelRight = sm[2]; d.r.operation = sm[3]; d.r.name = sm[4]; std::string str(sm[5]);
            str.erase(remove_if(str.begin(), str.end(), isspace), str.end()); std::regex comma(","); d.r.data.clear();
            if (str.size()) std::copy(std::sregex_token_iterator(str.begin(), str.end(), comma, -1), std::sregex_token_iterator(), std::back_inserter(d.r.data));
        }
        else is.setstate(std::ios::failbit);
        return is;
    }
    // Overload inserter operator. Only for debug purposes and for illustration
    friend std::ostream& operator << (std::ostream& os, const Data& d) {
        // Print normal data members
        std::cout << d.levelLeft << " :: " << d.r.levelRight << ' ' << d.r.operation << ' ' << d.r.name << " --> ";
        // Print the mebers of the vector
        std::copy(d.r.data.begin(), d.r.data.end(), std::ostream_iterator<std::string>(os, " "));std::cout << '\n';
        return os;
    }
};

using MyMap = std::map<std::string, Data::Right>;

int main()
{
    // Read all test data in an array of test data. The one-Liner  :-)
    std::vector<Data> dataAll{std::istream_iterator<Data>(testData), std::istream_iterator<Data>() };

    // For debug purposes. Print to console
    std::copy(dataAll.begin(), dataAll.end(), std::ostream_iterator<Data>(std::cout, "\n"));

    MyMap myMap{};  // Put all Data in map
    for (const Data& d : dataAll) myMap[d.levelLeft] = d.r;

    return 0;
}

So, main function is small and the rest is also not real big code. Rather simple.

Hope this gives some insight.

Armin Montigny
  • 7,879
  • 3
  • 11
  • 29