1

Looking for an efficient way to read a dict-type line containing unicode keys/values which themselves contain one or more single quotes. For example, consider a file with the following single line of text:

{u'price': 180.45, u'item': u'Black Jacket Men's Small'}

What is the most efficient way to read this line into a python dict object? I've gotten around it by using regex replace but want to make sure there's not some simpler tool for this.

Edit: This is very different from this question about lists because it deals with single-quote unicode literals that contain problematic single quotes inside the string values.

eperks
  • 81
  • 1
  • 6
  • Possible duplicate of [Convert string representation of list to list in Python](http://stackoverflow.com/questions/1894269/convert-string-representation-of-list-to-list-in-python) – juanpa.arrivillaga Apr 09 '17 at 19:14
  • As mentioned, that's not a valid string representation of a python `dict` - the apostrophe in "men's" should have been escaped. Is that an error in your description, or is the line really like that? If so, its not going to be easy to parse as it isn't python. – tdelaney Apr 09 '17 at 19:36
  • Unfortunately it is really like that, so I guess that answers my question. Thanks @tdelaney – eperks Apr 09 '17 at 19:39
  • This gets hard if commas and colons can be in the text (e.g., `u'Black Jacket: Men's Small'`). If not you could split on comma then split on colon and then do a bit of cleanup. – tdelaney Apr 09 '17 at 19:55
  • This is not always unambiguously solvable. Consider what the dictionary would look like if the string `u"foo': 100, u'bar"` was printed as key (using single quotes despite the internal apostrophes). – Blckknght Apr 09 '17 at 21:41
  • The answer to that question is to fix whatever system is generating that data, or ask to have it fixed. Eg. ask if it cannot be switched to use JSON. If that's not possible you'll have to implement the parser yourself. But that I would consider a really desperate last resort. – roeland Apr 09 '17 at 23:45

1 Answers1

2

You can use ast.literal_eval for this.

As the docs explain:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

As long as your values are one of those, this is a good solution.

Daniel Roseman
  • 541,889
  • 55
  • 754
  • 786
  • 1
    `u'Black Jacket Men's Small'` isn't a valid string though. – Aran-Fey Apr 09 '17 at 19:13
  • Hmm, good point. @eperks, what's generating that list? From the `u` prefixes and single quotes it does look as if it's a stringified Python dictionary, but Python would deal with the single quote by using double quotes around that item. – Daniel Roseman Apr 09 '17 at 19:15
  • @DanielRoseman I don't actually know how it was created so can't shed any light on that. – eperks Apr 09 '17 at 19:35