How to implement autovivification for nested dictionary ONLY when assigning values?

Question

TL;DR
How can I get superkeys to be autovivified in a Python dict when assigning values to subkeys, without also getting them autovivified when checking for subkeys?

Background: Normally in Python, setting values in a nested dictionary requires manually ensuring that higher-level keys exist before assigning to their sub-keys. That is,

my_dict[1][2] = 3

will not reliably work as intended without first doing something like

if 1 not in my_dict:
    my_dict[1] = {}

Now, it is possible to set up a kind of autovivification by making my_dict an instance of a class that overrides __missing__, as shown e.g. in https://stackoverflow.com/a/19829714/6670909.

Question: However, that solution silently autovivifies higher-level keys if you check for the existence of a sub-key in such a nested dict. That leads to the following unfortunateness:

>>> vd = Vividict()
>>> 1 in vd
False
>>> 2 in vd[1]
False
>>> 1 in vd
True

How can I avoid that misleading result? In Perl, by the way, I can get the desired behavior by doing

no autovivification qw/exists/;

And basically I'd like to replicate that behavior in Python if possible.

You can't. There's no difference between access `vd[1]` because you're assigning to it and because you're seeing what it contains, as far as `vd` is concerned. Also it's not a misleading result - after you've looked in `vd[1]`, `1` **is** `in vd`. — jonrsharpe, Feb 08 '17 at 20:06
Right - my hope is that there might be some way to do this, e.g., by constructing a class for nested dictionaries that **is** sensitive to the difference between (a) getting an item simply in order to check for the existence of a sub-item, and (b) getting an item in the context of trying to set a value of a sub-item. I think the distinction would have to be made, effectively, before the implicit call to `__getitem__`. — J. Lerman, Feb 08 '17 at 20:18
There isn't. `__getitem__` doesn't know what's being done with the result when it gets called. There's no earlier hook. You would have to provide your own method, rather than using `x in y` - `y.contains(x)`, for example. — jonrsharpe, Feb 08 '17 at 20:19
Hmm ... you might get away by overriding `.__contains__()` and `.keys()` on the outer dictionary (and returning instances of the same thing for inner dicts), so that only keys with non-empty values show up on retrieval. Get's tricky with recursive access and is probably not worth the trouble to get right to not violate the dictionary protocol. But yes, you can probably do it in python ... — dhke, Feb 08 '17 at 20:21
This might be clunky, but you could always check by using `.get` like this: `1 in d; 2 in d.get(1, {}); 1 in d` and always set with `setdefault` — juanpa.arrivillaga, Feb 08 '17 at 20:21
@dhke `vd.__contains__` isn't being called in `thing in vd[whatever]`, just `vd.__getitem__`. — jonrsharpe, Feb 08 '17 at 20:22
@jonrsharpe But it's called for `whatever in vd`, so it can return `False` when `vd[whatever]` is empty. It's not nice, since the dict is really there, but I'd still think it's possible. — dhke, Feb 08 '17 at 20:23

kindall · Answer 1 · 2017-02-09T01:13:34.993

This is not an easy problem to solve, because in your example:

my_dict[1][2] = 3

my_dict[1] results in a __getitem__ call on the dictionary. There is no way at that point to know that an assignment is being made. Only the last [] in the sequence is a __setitem__ call, and it can't succeed unless mydict[1] exists, because otherwise, what object are you assigning into?

So don't use autovivication. You can use setdefault() instead, with a regular dict.

my_dict.setdefault(1, {})[2] = 3

Now that's not exactly pretty, especially when you are nesting more deeply, so you might write a helper method:

class MyDict(dict):
    def nest(self, keys, value):
       for key in keys[:-1]:
          self = self.setdefault(key, {})
       self[keys[-1]] = value

 my_dict = MyDict()
 my_dict.nest((1, 2), 3)       # my_dict[1][2] = 3

But even better is to wrap this into a new __setitem__ that takes all the indexes at once, instead of requiring the intermediate __getitem__ calls that induce the autovivication. This way, we know from the beginning that we're doing an assignment and can proceed without relying on autovivication.

class MyDict(dict):
    def __setitem__(self, keys, value):
       if not isinstance(keys, tuple):
           return dict.__setitem__(self, keys, value)
       for key in keys[:-1]:
          self = self.setdefault(key, {})
       dict.__setitem__(self, keys[-1], value)

my_dict = MyDict()
my_dict[1, 2] = 3

For consistency, you could also provide __getitem__ that accepts keys in a tuple as follows:

def __getitem__(self, keys):
   if not isinstance(keys, tuple):
       return dict.__getitem__(self, keys)
   for key in keys:
       self = dict.__getitem__(self, key)
   return self

The only downside I can think of is that we can't use tuples as dictionary keys as easily: we have to write that as, e.g. my_dict[(1, 2),].

How does that solve OP's problem of autocreation of keys on access? i.e. `my_dict[2]` shouldn't add key `2` ... — dhke, Feb 08 '17 at 20:11
I fail to understand your objection. `my_dict[2]` doesn't add key `2`. — kindall, Feb 08 '17 at 20:22
It does, e.g. for `defaultdict(dict)` or if you override `__missing__()` as defaultdict ultimately does. Hence `1 in my_dict[2]` causes `mydict[2] == {}` to appear. — dhke, Feb 08 '17 at 20:31
... so don't use `defaultdict` or override `__missing__()`. Where did I instruct you to do either of those things in my answer? Use a regular `dict`! — kindall, Feb 08 '17 at 20:41
Hence the original question: How does your answer solve OP's problem which is exactly in that scenario? In you case, your don't provide the desired autovivication, which what this was all about. — dhke, Feb 08 '17 at 20:51
My answer was "don't use autovivication, use `setdefault` instead so you can control when to vivicate." — kindall, Feb 08 '17 at 20:58

dhke · Answer 2 · 2017-02-23T22:53:49.657

The proper answer is: don't do this in Python, since explicit is better than implicit.

But if you really want autovivification that does not keep empty sub-dictionaries, one can emulate the behavior in Python.

try:
    from collections import MutableMapping
except:
    from collections.abc import MutableMapping


class AutoDict(MutableMapping, object):
    def __init__(self, *args, **kwargs):
        super(AutoDict, self).__init__()
        self.data = dict(*args, **kwargs)

    def __getitem__(self, key):
        if key in self.data:
            return self.data.__getitem__(key)
        else:
            return ChildAutoDict(parent=self, parent_key=key)

    def __setitem__(self, key, value):
        return self.data.__setitem__(key, value)

    def __delitem__(self, key):
        return self.data.__delitem__(key)

    def __iter__(self):
        return self.data.__iter__()

    def __len__(self):
        return self.data.__len__()

    def keys(self):
        return self.data.keys()

    def __contains__(self, key):
       return data.__contains__(key)

    def __str__(self):
        return str(self.data)

    def __unicode__(self):
        return unicode(self.data)

    def __repr__(self):
        return repr(self.data)

class ChildAutoDict(AutoDict):
    def __init__(self, parent, parent_key):
        super(ChildAutoDict, self).__init__()
        self.parent = parent
        self.parent_key = parent_key

    def __setitem__(self, key, value):
        if self.parent is not None and not self.parent_key in self.parent:
            # if parent got a new key in the meantime,
            # don't add ourselves
            self.parent.data[self.parent_key] = self
        else:
           self.parent = None
        return self.data.__setitem__(key, value)

    def __delitem__(self, key):
        ret = self.data.__delitem__(key)
        # only remove ourselves from the parent if we are 
        # still occupying our slot.
        if not self and self.parent and self is self.parent[parent_key]:
            self.parent.data.pop(self.parent_key)
        return ret

What you get back from the __getitem__() is essentially a dictionary facade that adds itself to the parent dictionary only if itself is not empty and removes itself once it becomes empty.

All of this --of course-- stops working once you assign a "normal" dictionary somewhere in the middle, i.e. d[2] = {}, d[2][3] = {} doesn't work any more and so on.

I have not really tested this thoroughly, so beware of more pitfalls.

d = AutoDict()

print(1 in d)
>>> False
print(d)
>>> {}

print(d[2][3])
>>> {}
print(d[2])
>>> {}
print(d)
>>> {}

d[2][3] = 1
print(d)
>>> {2: {3: 1}}

del d[2][3]
print(d)
>>> {}

Hmm. This solution seems to cause `1 in d` to always evaluate to `True`. We get apparent autovivification while assigning to sub-keys, AND when checking for their presence in the dictionary. The desire though is to get autovivification of super-keys upon assignment to sub-keys, and to not get it upon existence-checking of sub-keys. For an empty dict, `2 in my_dict[1]` should return False (no KeyError) and subsequent `1 in my_dict` should still return False. — J. Lerman, Feb 08 '17 at 22:03
@J.Lerman Hmm, you're right, this needs at least an additional `__contains__()`. — dhke, Feb 09 '17 at 07:04
`__contains__()` fixed. Also needs to derive from `object` in Python 2 so that we have a new style class (and thus `__contains__()` actually works). — dhke, Feb 09 '17 at 18:30

How to implement autovivification for nested dictionary ONLY when assigning values?

2 Answers2

Linked