Parsing logical sentence very slow with pyparsing

Question

I try to use pyparsing to parse logical expressions such as these

x
FALSE
NOT x
(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)

(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)

((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND
 ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)

The code I wrote below seems to work OK -- but it is very slow (e.g. the last example above takes a few seconds). Did I structure the grammar in some inefficient way? may be recursion should be used instead of operatorPrecedence ? Is there a way to speed it up ?

identifier = Group(Word(alphas, alphanums + "_")  +  Optional("'"))
num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
operator = Regex(">=|<=|!=|>|<|=")
operand = identifier |  num  
aexpr = operatorPrecedence(operand,
                           [('*',2,opAssoc.LEFT,),
                            ('+',2,opAssoc.LEFT,),
                            (operator,2,opAssoc.LEFT,)
                            ])

op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
           (CaselessLiteral('and'),2,opAssoc.LEFT ,),
           (CaselessLiteral('or'), 2,opAssoc.LEFT ,),
           ('=>', 2,opAssoc.LEFT ,),
           ]
sentence = operatorPrecedence(aexpr,op_prec)
return sentence

*'did I use the recursion wrong some how?'* - what recursion? — Gareth Latty, Nov 07 '12 at 19:58
typo -- I mean did I use operatorPrecedence wrong somehow. Or should I use recursion instead ? — Vu Nguyen, Nov 07 '12 at 20:00
Perhaps [this example](http://pyparsing.wikispaces.com/file/detail/SimpleCalc.py) will help? I suspect the slow part of your code is the regex for num (so try and rewrite that or using pyparsing builtins). — Andy Hayden, Nov 07 '12 at 21:47
Paul, packratParsing makes it much faster - thank you. The pyparsing FAQ mentions that enabling this causes some doctests in pyaprsing to fail. So is it relatively safe to use this option ? — Vu Nguyen, Nov 08 '12 at 17:06
hyaden, the regex for num is not the problem. RickyA, what's cProfile ? — Vu Nguyen, Nov 08 '12 at 17:07
@user1419, see: http://docs.python.org/2/library/profile.html — Bart Kiers, Nov 14 '12 at 07:22
Cprofile is a profiler for Python. It shows you the calls that are made from the script and how much time they consume. Necessary tool for debugging since it gives you pointers to what part is slow. See [here](http://docs.python.org/2/library/profile.html#module-cProfile) — RickyA, Nov 20 '12 at 16:21
Pyparsing is no longer hosted on wikispaces.com. Go to https://github.com/pyparsing/pyparsing — PaulMcG, Aug 27 '18 at 12:52

Tessmore · Answer 1 · 2020-06-10T15:59:10.117

I had the same problem. Found a solution here (parserElement.enablePackrat()): https://github.com/pyparsing/pyparsing

The following code is now parsed instantly (vs 60 sec before)

ParserElement.enablePackrat()

integer  = Word(nums).setParseAction(lambda t:int(t[0]))('int')
operand  = integer | variable('var')

# Left precedence
eq    = Literal("==")('eq')
gt    = Literal(">")('gt')
gtEq  = Literal(">=")('gtEq')
lt    = Literal("<")('lt')
ltEq  = Literal("<=")('ltEq')
notEq = Literal("!=")('notEq')
mult  = oneOf('* /')('mult')
plus  = oneOf('+ -')('plus')

_and  = oneOf('&& and')('and')
_or   = oneOf('|| or')('or')

# Right precedence
sign     = oneOf('+ -')('sign')
negation = Literal('!')('negation')

# Operator groups per presedence
right_op = negation | sign 

# Highest precedence
left_op_1 = mult 
left_op_2 = plus 
left_op_3 = gtEq | ltEq | lt | gt
left_op_4 = eq   | notEq
left_op_5 = _and
left_op_6 = _or
# Lowest precedence

condition = operatorPrecedence( operand, [
     (right_op,   1, opAssoc.RIGHT),
     (left_op_1,  2, opAssoc.LEFT),
     (left_op_2,  2, opAssoc.LEFT),
     (left_op_3,  2, opAssoc.LEFT),
     (left_op_4,  2, opAssoc.LEFT),
     (left_op_5,  2, opAssoc.LEFT),
     (left_op_6,  2, opAssoc.LEFT)
    ]
)('computation')

Pyparsing is no longer hosted on wikispaces.com. Go to https://github.com/pyparsing/pyparsing — PaulMcG, Aug 27 '18 at 12:50

score 4 · Answer 2 · answered Nov 28 '12 at 11:12

I put your code into a small program

from sys import argv
from pyparsing import *

def parsit(aexpr):
    identifier = Group(Word(alphas, alphanums + "_")  +  Optional("'"))
    num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
    operator = Regex(">=|<=|!=|>|<|=")
    operand = identifier |  num
    aexpr = operatorPrecedence(operand,
                               [('*',2,opAssoc.LEFT,),
                                ('+',2,opAssoc.LEFT,),
                                (operator,2,opAssoc.LEFT,)
                                ])

    op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
               (CaselessLiteral('and'),2,opAssoc.LEFT ,),
               (CaselessLiteral('or'), 2,opAssoc.LEFT ,),
               ('=>', 2,opAssoc.LEFT ,),
               ]
    sentence = operatorPrecedence(aexpr,op_prec)
    return sentence

def demo02(arg):
    sent = parsit(arg)
    print arg, ":", sent.parseString(arg)

def demo01():
    for arg in ["x", "FALSE", "NOT x",
                  "(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)",
                  "(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)",
                  "((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)"
                  ]:
        demo02(arg)


if len(argv) <= 1:
    demo01()
else:
    for arg in argv[1:]:
        demo02(arg)

and ran through cProfile

$ python -m cProfile pyparsetest.py

You will find many parseImpl calls, but in the middle of the output there is

2906500/8   26.374    0.000   72.667    9.083 pyparsing.py:913(_parseNoCache)
212752/300    1.045    0.000   72.608    0.242 pyparsing.py:985(tryParse)

the 72.667 beeing the comulated time from 72 total.

Therefore I would venture the guess that "caching" would offer a good lever.

Just enabling http://pyparsing-public.wikispaces.com/FAQs did not help, thoug. I added the lines

import pyparsing
pyparsing.usePackrat = True

and the runtime was the same.

The Number-Regex also looks fine to me -- quite standard, I guess. For example replacing it with

#num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
num = Regex(r"8|1|10|100|5")

also did not help. There is no "empty match" in my simple variant, which I guessed might be an issue -- but it seems not.

Last try is to look at the result parser with:

....
sentence = operatorPrecedence(aexpr,op_prec)
print sentence 
return sentence
....

And... whow... long!

Well, and not using your first operatorPrecedence is a lot faster, but doesn't work anymore for arithmetics.

Thus, I would venture the guess that, yes, try to seperate the two kinds of expressions (boolean and arithmetic) more. Maybe that will improve it. I will look into it too, it interests me as well.

Just getting to look at your answer to this, very nice analysis. One correction: to enable packratting in pyparsing, you would do `pyparsing.ParserElement.enablePackrat()` as this is a method on the `ParserElement` class, not on the `pyparsing` module itself. — PaulMcG, Jun 01 '16 at 01:58

Parsing logical sentence very slow with pyparsing

2 Answers2

Linked

Related