Simple recursive descent in PyParsing

Question

I've tried taking this code and converting it to something for a project I'm working on for programming language processing, but I'm running into an issue with a simplified version:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

I've played around with a number of different modifications of this simple setup. Usually, trying something like:

print(expr.parseString('1+2'))

Will return ['1']. While I get caught in deep recursion with something like:

print(expr.parseString('(1+2)'))

What am I missing with respect to simple recursion that I can't parse arbitrarily arithmetic expressions, such as 1+(2 * 3-(4*(5+6)-(7))...?

PaulMcG · Answer 1 · 2018-10-08T12:16:56.797

Wow, I guess pyparsing is really on the map! Thanks Alex and John for stepping in on this question. You are both on the mark with your responses. But let me add a comment or two:

If we suppress the opening and closing parenthesis symbols, and group the parenthesized expression using Group, pyparsing will a structured result that is closer to an AST.

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group

def Syntax():
    op = oneOf('+ -')
    lpar  = Literal( '(' ).suppress()
    rpar  = Literal( ')' ).suppress()
    num = Word(nums)
    expr = Forward()
    atom = num | Group(lpar + expr + rpar)
    expr << atom + ZeroOrMore(op + atom)
    return expr

if __name__ == "__main__":
    expr = Syntax()
    def test(s):
        results = expr.parseString(s)
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

Giving:

(9 + 3) -> [['9', '+', '3']]
(9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]

Otherwise, pyparsing is just tokenizing, and you have to walk the list of parsed tokens to find the nested expressions.

Since op is defined as just oneOf("+ - * /"), there is no precedence of operations. There are examples on the pyparsing repo at https://github.com/pyparsing/pyparsing/tree/master/examples of the manual way to define this (fourFn.py), or the more recent approach using the infixNotation helper (simpleArith.py). Again, this has pyparsing adding more value than just tokenizing.

To the OP, please check out those examples, I think they will help move you forward on your project.

-- Paul

score 9 · Answer 2 · answered Aug 28 '09 at 05:19

Is this more or less what you want...?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

emitting

(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

? This "anchors" the recursion by separating an "atom" (number or parenthesized expression) from an "expression" (one or more "atoms" with operators in-between).

score 4 · Answer 3 · answered Aug 28 '09 at 05:19

4

A grammar like:

expr :: expr op expr

is hard to work with because the recursion just keeps diving into the left.

A normal arithmetic grammar would look something like:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

Basically, you never get S :: S; any time a nonterminal appears on the left and right hand sides of a line in the grammar, there must be some literal in the middle for the parser to consume.

answered Aug 28 '09 at 05:19

John Fouhy

37,154
18
59
73

Could you please add some tips on how to convert `expr :: expr op expr` to some other form that Pyparsing can handle, for example in my case at http://stackoverflow.com/questions/15438015/stack-overflow-when-pyparsing-ada-2005-scoped-identifiers-using-reference-manual – Nordlöw Mar 20 '13 at 14:07

Ant6n · Answer 4 · 2018-09-07T06:09:16.463

0

Use operatorPrecedence to build expressions. It'll build the correct expressions, and take care of operator precedence while at it:

num = Word(nums)
plusop = oneOf( '+ -')
multop = oneOf('/ *')
expr = operatorPrecedence(num,
                          [(multop, 2, opAssoc.LEFT),(plusop, 2, opAssoc.LEFT)])

example:

>> print parsetime.expr.parseString("1+(2 * 3-(4*(5+6)-(7)))")
[['1', '+', [['2', '*', '3'], '-', [['4', '*', ['5', '+', '6']], '-', '7']]]]

edited Sep 07 '18 at 06:09

answered Feb 24 '14 at 18:38

Ant6n

1,628
1
16
24

Pyparsing is no longer hosted on wikispaces.com. Go to https://github.com/pyparsing/pyparsing – PaulMcG Aug 27 '18 at 12:53

Simple recursive descent in PyParsing

4 Answers4