I agree that the attribute synthesis can be ... surprising.
The root cause seems to be an insistence on using semantic actions on raw parser expressions. The usual way to appraoch this would be like
auto str
= x3::rule<struct str_, std::string>{"str"}
= x3::lexeme[+(esc | (x3::char_ - x3::eol))];
and then, if you must, attach a SA to str
in the higher-level rule (which you basically have, in p_action
).
Second Problem
Seems to be very much the same as the first.
Same problem: I expect the attribute of src to be of type string, since the two arguments paran and ch of the alternative parser (paran | ch)
If you expect the attribute of src to be a specific type, you should probably declare it as such:
auto eol = x3::eol | x3::eoi;
auto paran
//= x3::rule<struct last_paren_, std::string> {"paran"} // aids in debug output
= char_(')') >> !eol;
auto src
= x3::rule<struct src_, std::string>{"src"}
= lexeme['(' >> *(~char_("\r\n \t\b\f)") | paran) >> ')'];
Observations
I question that the grammar is correct. In particular, the way you currently specify how parentheses can be embedded inside hyperlink source specs doesn't match markdown engines I'm aware of
Including StackOverflow's, as you can see. There was no need for the last parenthesis to be at EOL.
If you have a reference the particular Markdown specification you're trying to implement, I'd be happy to evaluate more.
Also, I'm not convinced that the approach of doing all the heavy lifting in semantic actions (see Boost Spirit: "Semantic actions are evil"?) is a helpful one.
In general I'd advise to separate concerns of parsing and output generation. This will make it much easier to
- achieve/test correctness
- maintain the parser
- change the way output is generated from a parsed document
Full Demo
Here's a demo tying things together, while still keeping with your current approach, emphasizing semantic actions.
Hopefully the improvements and ideas shown help. Especially the conditionally enabled rule debugging could be a big productivity accelerator while you're learning or maintaining your grammar.
Live On Compiler Explorer
#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fstream>
#include <iostream>
#include <boost/core/demangle.hpp>
namespace x3 = boost::spirit::x3;
namespace Parser {
using x3::char_;
using x3::lit;
using x3::lexeme;
// simplified code structure
#if 0
auto str = lexeme[+(char_ - x3::eol)];
#else
auto esc = [] {
x3::symbols<char> esc;
esc.add
("\\(", '(')("\\)", ')')
("\\[", '[')("\\]", ']')
/* ... */;
return esc;
}();
auto eol = x3::eol | x3::eoi;
auto paran
//= x3::rule<struct last_paren_, std::string> {"paran"} // aids in debug output
= char_(')') >> !eol;
auto src
= x3::rule<struct src_, std::string>{"src"}
= lexeme['(' >> *(~char_("\r\n \t\b\f)") | paran) >> ')'];
auto hyperlink
= x3::rule<struct hyperlink_, std::string>{"hyperlink"}
= '[' >> *(esc | ~char_("\r\n]")) >> ']' >> src;
auto str
= x3::rule<struct str_, std::string>{"str"}
= lexeme[
+( esc
| &lit('[') >> hyperlink // the &lit supresses verbose debug
| (char_ - x3::eol)
)];
#endif
auto h1_action = [](auto &) { /* generate output */ };
auto h1
= x3::rule<struct h1_, std::string> {"h1"}
= ("# " > str)[h1_action]
;
auto p_action = [](auto &) { /* generate output */ };
auto p
= x3::rule<struct p_, std::string> {"p"}
= (+(str > eol))[p_action];
auto content
= x3::rule<struct lines_, std::string> {"content"}
= (h1 | p) % +x3::eol;
auto markdown = x3::skip(x3::blank)[*x3::eol >> content];
} // namespace Parser
int main() {
#if 0
std::ifstream ifs("input.txt");
std::string const s(std::istreambuf_iterator<char>(ifs), {});
#else
std::string const s = R"(
# Frist
This [introduction](https://en.wikipedia.org/wiki/Wikipedia:Introduction_(historical))
serves no purpose. Other than to show some [[hyper\]links](path/to/img(2).png)
)";
#endif
parse(begin(s), end(s), Parser::markdown);
}
Prints debug output:
<content>
<try># Frist\n\n This [i</try>
<h1>
<try># Frist\n\n This [i</try>
<str>
<try>Frist\n\n This [int</try>
<success>\n\n This [introduc</success>
<attributes>[F, r, i, s, t]</attributes>
</str>
<success>\n\n This [introduc</success>
</h1>
<h1>
<try>This [introduction](</try>
<fail/>
</h1>
<p>
<try>This [introduction](</try>
<str>
<try>This [introduction](</try>
<hyperlink>
<try>[introduction](https</try>
<src>
<try>(https://en.wikipedi</try>
<success>\n serves no purpo</success>
<attributes>[h, t, t, p, s, :, /, /, e, n, ., w, i, k, i, p, e, d, i, a, ., o, r, g, /, w, i, k, i, /, W, i, k, i, p, e, d, i, a, :, I, n, t, r, o, d, u, c, t, i, o, n, _, (, h, i, s, t, o, r, i, c, a, l, )]</attributes>
</src>
<success>\n serves no purpo</success>
<attributes>[i, n, t, r, o, d, u, c, t, i, o, n, h, t, t, p, s, :, /, /, e, n, ., w, i, k, i, p, e, d, i, a, ., o, r, g, /, w, i, k, i, /, W, i, k, i, p, e, d, i, a, :, I, n, t, r, o, d, u, c, t, i, o, n, _, (, h, i, s, t, o, r, i, c, a, l, )]</attributes>
</hyperlink>
<success>\n serves no purpo</success>
<attributes>[T, h, i, s, , i, n, t, r, o, d, u, c, t, i, o, n, h, t, t, p, s, :, /, /, e, n, ., w, i, k, i, p, e, d, i, a, ., o, r, g, /, w, i, k, i, /, W, i, k, i, p, e, d, i, a, :, I, n, t, r, o, d, u, c, t, i, o, n, _, (, h, i, s, t, o, r, i, c, a, l, )]</attributes>
</str>
<str>
<try> serves no purpos</try>
<hyperlink>
<try>[[hyper\]links](path</try>
<src>
<try>(path/to/img(2).png)</try>
<success>\n </success>
<attributes>[p, a, t, h, /, t, o, /, i, m, g, (, 2, ), ., p, n, g]</attributes>
</src>
<success>\n </success>
<attributes>[[, h, y, p, e, r, ], l, i, n, k, s, p, a, t, h, /, t, o, /, i, m, g, (, 2, ), ., p, n, g]</attributes>
</hyperlink>
<success>\n </success>
<attributes>[s, e, r, v, e, s, , n, o, , p, u, r, p, o, s, e, ., , O, t, h, e, r, , t, h, a, n, , t, o, , s, h, o, w, , s, o, m, e, , [, h, y, p, e, r, ], l, i, n, k, s, p, a, t, h, /, t, o, /, i, m, g, (, 2, ), ., p, n, g]</attributes>
</str>
<str>
<try> </try>
<fail/>
</str>
<success> </success>
</p>
<success> </success>
</content>