1

I am trying to compare two dictionaries for equality. The dictionaries have a nested dictionary structure. Reading the documentation, I understand that Python should be able to compare them; the only difference is that one has strings and the other has unicode strings (I'm on Python 2).

> state in states
False
> dict(state) == dict(states[0])
False
> json.dumps(state, sort_keys=True) == json.dumps(states[0], sort_keys=True)
True

The only difference is that the keys for the top-level and nested dictionaries are strings in one and unicode in the other but testing a toy example looks like Python doesn't care.

> state.keys() == states[0].keys()
True
> state.keys()
['slots', 'responses']
> states[0].keys()
[u'slots', u'responses']
> {'a':1, 'b': {'c':2}} == {u'a':1, u'b': {u'c':2}}
True

I already wrapped these dictionaries in a class and I can perform operations using JSON but I'd like to understand what's happening.

class OverloadedDict(dict):
    def __init__(self, *args, **kwargs):
        dict.__init__(self, *args, **kwargs)
        self.__dict__ = self

    def __hash__(self):
        return hash(json.dumps(dict(self), sort_keys=True))

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return json.dumps(dict(self), sort_keys=True) == json.dumps(dict(other), sort_keys=True)
        else:
            return False

    def __ne__(self, other):
        return not (self == other)

I'd like to avoid having to convert them to JSON. Is there an easy and efficient way to compare them for equality and membership?

Update

@juanpa.arrivillaga in the comments suggested switching to unicode literals by virtue of using from __future__ import unicode_literals. That did not work as the dictionaries are initialized with keyword arguments. After importing unicode_literals, dict({'a':1}) will be {u'a': 1} but dict(a=1) will still be {'a': 1}.

Josep Valls
  • 5,102
  • 2
  • 30
  • 59
  • Don't mix unicode strings and byte strings, would be the right way to do things. – juanpa.arrivillaga Jan 08 '19 at 19:14
  • btw, why: `self.__dict__ = self` ? – juanpa.arrivillaga Jan 08 '19 at 19:15
  • Agreed. The issues is that `states` is loaded from a file and `json.load` will load all the strings as unicode. – Josep Valls Jan 08 '19 at 19:16
  • About the `self.__dict__ = self`, no particular reason, just testing – Josep Valls Jan 08 '19 at 19:17
  • So? Why are you even using byte strings? Just `from __future__ import unicode_literals` and your string literabls should now create unicode objects. The best way to avoid all these problems is to simply use Python 3, or to stick to only unicode objects in Python 2, – juanpa.arrivillaga Jan 08 '19 at 19:17
  • I just tried importing `unicode_literals` but it is not working. This module creates dictionaries using keyword arguments and those are not being converted to unicode literals. e.g. `dict({'a':1})` will be `{u'a': 1}` but `dict(a=1)` will be `{'a': 1}`. – Josep Valls Jan 08 '19 at 19:42
  • I am a bit confused. Looking at your first two entries. Is `state` a key (`str`/`unicode` presumably? Or is it a `dict` as well as the following lines would suggest (expected to be equal to `dict` `states[0]`? – Ondrej K. Jan 08 '19 at 19:46
  • @OndrejK. `state` is a dictionary with nested dictionaries. `states` is a list of dictionaries with nested dictionaries that I expect to contain `state`; it should be the first element, that's why the `json.dumps(state, sort_keys=True) == json.dumps(states[0], sort_keys=True)` comparison works. – Josep Valls Jan 08 '19 at 19:48
  • Then you need to convert one of the types to the other. This should be pretty straightforward. I suspect it will be faster than serializing to JSON and then doing a string comparison, but it's hard to say. You should *really* be writing your code to handle this sort of thing from the beginning, assuming you are being forced to use Python 2. – juanpa.arrivillaga Jan 08 '19 at 20:00
  • 1
    Another alternative, alter your json decoder using one of the hooks to encode the unicode strings at the source: https://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-from-json, but note, this is essentially equivalent to converting your byte-str dictionary to unicode. Not sure if there will be much performance gains – juanpa.arrivillaga Jan 08 '19 at 20:03
  • There must be something else going on your code (and/or content). Equivalent `str` and `unicode` should test equal: With `dct1 = {'a': 1, 'b': {'c': 3}}` and `dct_list = [{u'a': 1, u'b': {u'c': 3}}, {u'ax': 1, u'bx': {u'cx': 3}}]` -> `dct1 in dct_list` evaluates `True`, so do `dct1 == dct_list[0]` and `dict(dct1) == dict(dct_list[0])`. If you could share an actual sample, that might help. If not, perhaps try overloading `__eq__` to make it a bit chatty and get more step by step insight? – Ondrej K. Jan 08 '19 at 23:58

0 Answers0