13

I'm trying to parse strings of the form:

'foo(bar:baz;x:y)'

I'd like the results to be returned in form of a nested dictionary, i.e. for the above string, the results should look like this:

{ 'foo' : { 'bar' : 'baz', 'x' : 'y' } }

Despite numerous combinations of Dict() and Group() I can't get it to work. My (one of the versions of) grammar looks like this:

import pyparsing as pp
field_name = pp.Word( pp.alphanums )
field_value = pp.Word( pp.alphanums )
colon = pp.Suppress( pp.Literal( ':' ) )

expr = pp.Dict( 
    pp.Group( 
        field_name + \
        pp.nestedExpr( 
            content = pp.delimitedList( 
                 pp.Group( field_name + colon + field_value ), 
                 delim = ';' 
            ) 
        ) 
    ) 
)

and now, the results are as follows:

In [62]: str = 'foo(bar:baz;x:y)'

In [63]: expr.parseString( str ).asList()
Out[63]: [['foo', [['bar', 'baz'], ['x', 'y']]]]

In [64]: expr.parseString( str ).asDict()
Out[64]: {'foo': ([(['bar', 'baz'], {}), (['x', 'y'], {})], {})}

In [65]: print( expr.parseString( str ).dump() )
Out[65]: [['foo', [['bar', 'baz'], ['x', 'y']]]]
         - foo: [['bar', 'baz'], ['x', 'y']]

So the asList() version looks quite good to me and should yield a dictionary I'm after I think. Of course given that (the way I understand it, please correct me) Dict() will parse lists of tokens by using the first element of the list as a key and all the rest as values of that key in a dictionary. This works insofar the dictionary is not nested. For example in such case:

expr = pp.Dict( 
    pp.delimitedList( 
        pp.Group( field_name + colon + field_value ), 
        delim = ';' 
    ) 
)

In [76]: expr.parseString( 'foo:bar;baz:x' ).asDict()
Out[76]: {'baz': 'x', 'foo': 'bar'}

So, the question is what is wrong with the first case (and my understanding of the problem) or perhaps Dict() can't cope with such case? I could use asList() and convert that manually into a dictionary, but I'd rather have pyparsing do it :)

Any help or directions would be greately appreciated.

Thank you.

kgr
  • 9,184
  • 2
  • 36
  • 42

1 Answers1

8

Two problems:

  • You are missing a pp.Dict around pp.delimitedList to make asDict on the inner result work correctly
  • You are only calling asDict on the outermost ParsingResult instance, leaving the inner ParsingResult "uninterpreted"

I tried the following:

from pyparsing import *
field_name = field_val = Word(alphanums)
colon = Suppress(Literal(':'))

expr = Dict(Group(
    field_name +
    nestedExpr(content =
        Dict(delimitedList( 
            Group(field_name + colon + field_value), 
            delim = ';' 
        ))
    )
))

Then used it like this:

>>> res = expr.parseString('foo(bar:baz;x:y)')
>>> type(res['foo'])
<class 'pyparsing.ParseResults'>
>>> { k:v.asDict() for k,v in res.asDict().items() }
{'foo': {'x': 'y', 'bar': 'baz'}}
Niklas B.
  • 84,596
  • 15
  • 180
  • 217
  • Nice catch on the missing `pp.Dict`. Also, try printing `res.dump()` to see the nested keys and values. (Since `res` is a ParseResults object, it will support the nested dict-style access without converting using asDict: `res['foo']['x']` gives 'y'; or you can use dotted attribute notation as long as the keys are nice Python identifiers: `res.foo.bar` gives 'baz'.) – PaulMcG Apr 03 '12 at 16:00
  • Hi @Paul, nice receiving a compliment from the author himself :) I find `res.dump()` not much more informative than just `str(res)`, but maybe I just don't know how to interpret it? Have never use pyparsing before, I should say. – Niklas B. Apr 03 '12 at 16:04
  • Thank you very much Niklas! I wasn't aware that inside the results there are also ParseResults instances, I thought they would be either lists or dicts already. Paul - thanks for the tip on using as dict without conversion, this might actually come in handy in what I'm working on! :) – kgr Apr 03 '12 at 16:04
  • Also took me a bit to figure out, as the string representation is the same as for Python data structures. @Paul: Is there a reason why the representation doesn't include a hint about the type? I guess that would be a helpful feature :) – Niklas B. Apr 03 '12 at 16:07
  • res.dump() should show the nested list of tokens, followed by an indented tree structure of any named results - should be fairly straightforward with this simple example. The str() representation of a ParseResults shows the list of tokens, then a dict-style repr of the named results, altogether as a tuple. I find the str() output kind of ugly actually, and much prefer dump(). As to actually showing some indication of type, it gets even uglier with nested results, since as you discovered, not only is the outer object a ParseResults, but so are all the inner nested ones. – PaulMcG Apr 03 '12 at 16:19
  • OBE - as of pyparsing 2.1.0 (Feb 2016), asDict() will dict-ify nested ParseResults also. – PaulMcG Jan 30 '19 at 13:48