4

I'm trying to use multiprocessing.pool to speed up some parsing of a file parsed using pyparsing, however I get a multiprocessing.pool.MaybeEncodingError exception whenever I try this.

I've narrowed it down to something to do with returning a dictionary (ParseResults.asDict()), using asList() the error doesn't occur; but the input I'm actually parsing is pretty complex so ideally I'd like to use asDict.

The actual data being parsed is an Erlang list of tagged tuples, which I want to map to a python list. The grammar for this is pretty complex, so I've instead got a simplified test case (updated to include a nested dict):

#!/usr/bin/env python2.7
from pyparsing import *
import multiprocessing

dictionary = Forward()
key = Word(alphas)
sep   = Suppress(":")
value = ( key | dictionary )
key_val = Group( key + sep + value )
dictionary <<= Dict( Suppress('[') + delimitedList( key_val ) + Suppress(']') )

def parse_dict(s):
    p = dictionary.parseString(s).asDict()
    return p

def parse_list(s):
    return dictionary.parseString(s).asList()

# This works (list)
data = ['[ foo : [ bar : baz ] ]']
pool = multiprocessing.Pool()
pool.map(parse_list, data)

# This fails (dict)
pool.map(parse_dict, data)

Fails with:

Traceback (most recent call last):
  File "lib/python/nutshell/multi_parse.py", line 19, in <module>
    pool.map(parse, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 250, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 554, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[{'foo': ([(['bar', 'baz'], {})], {'bar': [('baz', 0)]})}]'. Reason: 'TypeError("'str' object is not callable",)'
DaveR
  • 9,286
  • 2
  • 35
  • 58

1 Answers1

2

Update: The question has significantly changed since the update. The original point of not being pickable still stands and is left below.

You say in your grammar you use a delimitedList, so let's add that to our test case:

data = ['[ foo : [ bar : baz ], cat:dog ]']

There is no reason why your "dictionary" grammar object is a python dict, it is a list. If you didn't mean that you'll have to change delimitedList to something else. I've updated grammar to allow for a proper pickling using a parseAction:

dictionary = Forward()
key   = Word(alphas)
LP, RP, sep = map(Suppress, "[]:")
value = key | dictionary
key_val = key("key") + sep + value("val")
dictionary <<= LP + delimitedList( key_val ) + RP

def parse_key_val(x): return {x.key:x.val}
key_val.setParseAction(parse_key_val)

def parse_dict(s):
    # Yes, it's a list, not a dict!
    return dictionary.parseString(s).asList()

def parse_list(s):
    return dictionary.parseString(s).asList()

This gives a working answer in parallel:

[[{'foo': {'bar': 'baz'}}, {'cat': 'dog'}]]

Original answer: I think that multiprocessing fails since it can't pickle the object. You think you have a dict, but if you look at:

def parse_dict(s):
    val = lang.parseString(s).asDict()
    print type(val["foo"])
    return val

You'll find out that the inner type is a <class 'pyparsing.ParseResults'>. I'm not sure how to apply pp.Dict recursively, but a really simple fix would be to change your grammar:

value = ( Word(alphas) )
sep   = Suppress(":")
key_val = Group( value + sep + value )
lang = Dict( Suppress('[') + delimitedList( key_val ) + Suppress(']') )

Which now allows pp.Dict to operate properly. For what it's worth, I've found that many of my multiprocessing woes come from an object that can't be properly serialized, so it's usually the first place I look.

A useful and related question:

Can't get pyparsing Dict() to return nested dictionary

Community
  • 1
  • 1
Hooked
  • 70,732
  • 35
  • 167
  • 242
  • Thanks - in the example I gave Suppress()ing the colon does fix the problem, however in that case I over-simplified my testcase as I get the problem in the real grammar. Let me try and get a more representative test case... – DaveR Feb 11 '14 at 15:10
  • I've update the example - Suppressing the separator isn't enough to fix it. I'll look into the source for asDict(). – DaveR Feb 11 '14 at 15:22
  • @DaveRigby I've updated my answer. I think you have some problems with your grammar def. or your example isn't complete enough. Either way, any new updates might be better served as a new question. – Hooked Feb 11 '14 at 15:56
  • Thanks again, I'm getting closer. The actual grammar I'm parsing is erlang data structures which essentially do map to a dictionary (it's an arbitrarily ordered list of key-value pairs; where the value can be nested). So in other words I **do** need to return a dict, as the test case has. – DaveR Feb 11 '14 at 17:34
  • @DaveRigby You might be better off (in a new question, and link here), posting your grammar and some sample code + expected output. Paul, the author of pyparsing is on here quite a bit and might help and we as a community can look over the complete picture. – Hooked Feb 11 '14 at 18:38
  • marking as answered, as in the example I gave your solution does work :) However I'm still not there with the Erlang, will raise another Q shortly... – DaveR Feb 12 '14 at 08:19