Rule precedence issue with grako

Question

I'm redoing a minilanguage I originally built on Perl (see Chessa# on github), but I'm running into a number of issues when I go to apply semantics.

Here is the grammar:

(* integers *)
DEC = /([1-9][0-9]*|0+)/;
int = /(0b[01]+|0o[0-7]+|0x[0-9a-fA-F]+)/ | DEC;
(* floats *)
pointfloat = /([0-9]*\.[0-9]+|[0-9]+\.)/;
expfloat = /([0-9]+\.?|[0-9]*\.)[eE][+-]?[0-9]+/;
float = pointfloat | expfloat;
list = '[' @+:atom {',' @+:atom}* ']';
(* atoms *)
identifier = /[_a-zA-Z][_a-zA-Z0-9]*/;
symbol = int        |
         float      |
         identifier |
         list;
(* functions *)
arglist = @+:atom {',' @+:atom}*;
function = identifier '(' [arglist] ')';
atom = function | symbol;
prec8 = '(' atom ')' | atom;
prec7 = [('+' | '-' | '~')] prec8;
prec6 = prec7 ['!'];
prec5 = [prec6 '**'] prec6;
prec4 = [prec5 ('*' | '/' | '%' | 'd')] prec5;
prec3 = [prec4 ('+' | '-')] prec4;
(* <| and >| are rotate-left and rotate-right, respectively. They assume the nearest C size. *)
prec2 = [prec3 ('<<' | '>>' | '<|' | '>|')] prec3;
prec1 = [prec2 ('&' | '|' | '^')] prec2;
expr = prec1 $;

The issue I'm running into is that the d operator is being pulled into the identifier rule when no whitespace exists between the operator and any following alphanumeric strings. While the grammar itself is LL(2), I don't understand where the issue is here.

For instance, 4d6 stops the parser because it's being interpreted as 4 d6, where d6 is an identifier. What should occur is that it's interpreted as 4 d 6, with the d being an operator. In an LL parser, this would indeed be the case.

A possible solution would be to disallow d from beginning an identifier, but this would disallow functions such as drop from being named as such.

This question is too broad. You may want to post a more precise question, focussing in a specific coding problem. Otherwise your post risks being closed due to 'too broad' or 'unclear question' flags — joaquin, Sep 24 '14 at 12:21
2 and 3 "don't work"; that's pretty vague. I suggest you narrow the question to 1 until you van formulate the others properly. — dfeuer, Sep 24 '14 at 12:52
I think that the answer to your problem may be in [this previous question](http://stackoverflow.com/q/24600189/545637), but please post a grammar fragment and sample input to form a valid question. — Apalala, Sep 25 '14 at 12:24
I think the issue is related, but the answer presented there doesn't have anything to do with the problem here. In fact, the problem appears to be, to me at least, that grako emits an LR parser, whereas I specifically need an LL parser. — Aerdan, Sep 25 '14 at 12:48
@Aerdan, Grako emits a PEG parser, which is top-down, and thus similar to an LL parser, and very different from an LR one. Please provide a specific example of the problem you're experiencing (a few grammar rules and the problematic input) so the forum can help. — Apalala, Sep 25 '14 at 12:49
I did; see the grammar link (I refuse to provide contextless snippets, since I can't know what portions of the grammar to leave out as extraneous) now provided, and the example misparsed input has also been included. — Aerdan, Sep 25 '14 at 13:35

score 3 · Answer 1 · answered Sep 26 '14 at 16:44

In Perl, you can use Marpa, a general BNF parser, which supports generalized precedence with associativity (and many more) out of the box, e.g.

:start ::= Script
Script ::= Expression+ separator => comma
comma ~ [,]
Expression ::=
    Number bless => primary
    | '(' Expression ')' bless => paren assoc => group
   || Expression '**' Expression bless => exponentiate assoc => right
   || Expression '*' Expression bless => multiply
    | Expression '/' Expression bless => divide
   || Expression '+' Expression bless => add
    | Expression '-' Expression bless => subtract

Full working example is here. As for programming languages, there is a C parser based on Marpa.

Hope this helps.

Apalala · Accepted Answer · 2014-09-26T11:46:43.897

1

The problem with your example is that Grako has the nameguard feature enabled by default, and that won't allow parsing just the d when d6 is ahead.

To disable the feature, instantiate your own Buffer and pass it to an instance of the generated parser:

from grako.buffering import Buffer
from myparser import MyParser

# get the text
parser = MyParser()
parser.parse(Buffer(text, nameguard=False), 'expre')

The tip version of Grako in the Bitbucket repository adds a --no-nameguard command-line option to generated parsers.

edited Sep 26 '14 at 11:46

answered Sep 25 '14 at 21:14

Apalala

8,159
3
26
47

1

The tip version of Grako in the [Bitbucket repository](https://bitbucket.org/apalala/grako) adds a ``--no-nameguard`` command-line option to generated parsers. – Apalala Sep 25 '14 at 21:14
1

That's interesting... `nameguard=False` fixes the `d` operator, but it also causes non-decimal integers (`0b1`, `0o5`, `0xF` for instance) to fail to parse. – Aerdan Sep 27 '14 at 15:19
1

@Aerdan It must be unusual that a language uses would-be identifiers as operators, or I would have seen it before. If you want to use an unbounded ``d`` operator, a PEG grammar will let you do so, but it will take work. – Apalala Sep 27 '14 at 21:20
Are you parsing Perl? Is the ``4d6`` expression parsed as you want in it? If so, no wonder... – Apalala Sep 28 '14 at 06:04

Rule precedence issue with grako

2 Answers2

Linked