17

I've tried taking this code and converting it to something for a project I'm working on for programming language processing, but I'm running into an issue with a simplified version:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

I've played around with a number of different modifications of this simple setup. Usually, trying something like:

print(expr.parseString('1+2'))

Will return ['1']. While I get caught in deep recursion with something like:

print(expr.parseString('(1+2)'))

What am I missing with respect to simple recursion that I can't parse arbitrarily arithmetic expressions, such as 1+(2 * 3-(4*(5+6)-(7))...?

Community
  • 1
  • 1

4 Answers4

27

Wow, I guess pyparsing is really on the map! Thanks Alex and John for stepping in on this question. You are both on the mark with your responses. But let me add a comment or two:

  1. If we suppress the opening and closing parenthesis symbols, and group the parenthesized expression using Group, pyparsing will a structured result that is closer to an AST.

    from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
    
    def Syntax():
        op = oneOf('+ -')
        lpar  = Literal( '(' ).suppress()
        rpar  = Literal( ')' ).suppress()
        num = Word(nums)
        expr = Forward()
        atom = num | Group(lpar + expr + rpar)
        expr << atom + ZeroOrMore(op + atom)
        return expr
    
    if __name__ == "__main__":
        expr = Syntax()
        def test(s):
            results = expr.parseString(s)
            print s,'->', results
    
        test( "(9 + 3)" )
        test( "(9 + 3) * (4 / 5)" )
    

    Giving:

    (9 + 3) -> [['9', '+', '3']]
    (9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]
    

    Otherwise, pyparsing is just tokenizing, and you have to walk the list of parsed tokens to find the nested expressions.

  2. Since op is defined as just oneOf("+ - * /"), there is no precedence of operations. There are examples on the pyparsing repo at https://github.com/pyparsing/pyparsing/tree/master/examples of the manual way to define this (fourFn.py), or the more recent approach using the infixNotation helper (simpleArith.py). Again, this has pyparsing adding more value than just tokenizing.

To the OP, please check out those examples, I think they will help move you forward on your project.

-- Paul

PaulMcG
  • 56,194
  • 14
  • 81
  • 122
9

Is this more or less what you want...?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

emitting

(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

? This "anchors" the recursion by separating an "atom" (number or parenthesized expression) from an "expression" (one or more "atoms" with operators in-between).

Alex Martelli
  • 762,786
  • 156
  • 1,160
  • 1,345
4

A grammar like:

expr :: expr op expr

is hard to work with because the recursion just keeps diving into the left.

A normal arithmetic grammar would look something like:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

Basically, you never get S :: S; any time a nonterminal appears on the left and right hand sides of a line in the grammar, there must be some literal in the middle for the parser to consume.

John Fouhy
  • 37,154
  • 18
  • 59
  • 73
  • Could you please add some tips on how to convert `expr :: expr op expr` to some other form that Pyparsing can handle, for example in my case at http://stackoverflow.com/questions/15438015/stack-overflow-when-pyparsing-ada-2005-scoped-identifiers-using-reference-manual – Nordlöw Mar 20 '13 at 14:07
0

Use operatorPrecedence to build expressions. It'll build the correct expressions, and take care of operator precedence while at it:

num = Word(nums)
plusop = oneOf( '+ -')
multop = oneOf('/ *')
expr = operatorPrecedence(num,
                          [(multop, 2, opAssoc.LEFT),(plusop, 2, opAssoc.LEFT)])

example:

>> print parsetime.expr.parseString("1+(2 * 3-(4*(5+6)-(7)))")
[['1', '+', [['2', '*', '3'], '-', [['4', '*', ['5', '+', '6']], '-', '7']]]]
Ant6n
  • 1,628
  • 1
  • 16
  • 24