1

Let's say I have a dict that looks like this:

d['a']['1'] = 'foo'
d['a']['2'] = 'bar'
d['b']['1'] = 'baz'
d['b']['2'] = 'boo'

If I want to get every item where the first key is 'a', I can just do d['a'] and I will get all of them. However, what if I want to get all items where the second key is '1'? The only way I can think of is to make a second dictionary with a reverse order of the keys, which requires duplicating the contents. Is there a way to do this within a single structure?

Edit: forgot to mention: I want to do this without iterating over everything. I'm going to be dealing with dicts with hundreds of thousands of keys, so I need something scalable.

Dariush
  • 421
  • 3
  • 11
  • 1
    I would suggest two dictionaries: one organized by letters, the other by numbers. There will be no content duplication, as each dictionary will only hold references to the objects. A NumPy 2D array is another oprion. – DYZ Apr 27 '20 at 00:27
  • what are you optimizing for? speed or storage? are you doing these lookups by 2nd key all the time/some of the time/very rarely? if frequent, then yes, maybe secondary data structure is useful. if infrequent, you could iterate through d.values() and add the dvalue['1'] when found to a list iterator. because otherwise you are paying for both the storage (keeping in mind that the contents are not duplicated) but also for creating that secondary data structure for everything, whether you need it or not. it also depends if you add/delete items to your `d` because then you would have to sync. – JL Peyret Apr 27 '20 at 00:54

3 Answers3

1

You're dealing with three dictionaries in this example: One with the values "foo" and "bar", one with the values "baz" and "boo", and an outer dictionary that maps the keys "a" and "b" to those first two inner dictionaries. You can iterate over the keys of both the outer and inner dictionaries with a nested for loop:

items = []
for outer_key in d:
    for inner_key in d[outer_key]:
        if inner_key == "1":
            items.append(d[outer_key][inner_key])
            break  # No need to keep checking keys once you've found a match

If you don't care about the keys of the outer dictionary, you can also use d.values() to ignore the keys and just see the inner dictionaries, then do a direct membership check on those:

items = []
for inner_dict in d.values():
    if "1" in inner_dict:
        items.append(inner_dict["1"])

This can also be written as a list comprehension:

items = [inner_dict["1"] for inner_dict in d.values() if "1" in inner_dict]
water_ghosts
  • 628
  • 3
  • 10
0

What you want sounds very similar to a tree-structure which can be implemented as a dictionary-of-dictionaries. Here's a simple implement taken from one of the answers to the question What is the best way to implement nested dictionaries?:

class Tree(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

    def get_second_level(self, second_key):
        found = []
        for level2 in self.values():
            if second_key in level2:
                found.append(level2[second_key])
        return found

d = Tree()
d['a']['1'] = 'foo'
d['a']['2'] = 'bar'
d['b']['1'] = 'baz'
d['b']['2'] = 'boo'
d['c']['2'] = 'mox'
d['c']['3'] = 'nix'

print(d)            # -> {'a': {'1': 'foo', '2': 'bar'}, 'b': {'1': 'baz', '2': 'boo'},
                    #     'c': {'2': 'mox', '3': 'nix'}}
print(d['a'])       # -> {'1': 'foo', '2': 'bar'}
print(d['a']['1'])  # -> foo
print(d['a']['2'])  # -> bar

print()
second_key = '1'
found = d.get_second_level(second_key)
print(f'Those with a second key of {second_key!r}')  # -> Those with a second key of '1'
print(f'  {found}')                                  # ->   ['foo', 'baz']
martineau
  • 99,260
  • 22
  • 139
  • 249
  • but he's not skipping `1` or `2`, the second key, he's skipping `a` or `b`, the first. he wants somedict.get('1') to return `['foo','baz']`. it's an interesting question, I wonder what's going to come up from it. – JL Peyret Apr 27 '20 at 00:48
  • @JLPeyret: Good point (I missed that). Fortunately it's fairly easy to do — see updated answer. – martineau Apr 27 '20 at 01:27
0

So after sleeping on it the solution I came up with was to make three dicts, the main one where the data is actually stored and identified by a tuple (d['a', '1'] = 'foo') and the other two are indexes that store all possible values of key B under key A where (A,B) is a valid combination (so a['a'] = ['1', '2'], b['1'] = ['a', 'b']. I don't entirely like this, since it still requires a hefty storage overhead and doesn't scale efficiently to higher numbers of keys, but it gets the job done without iterating and without duplicating the data. If anyone has a better idea, I'll be happy to hear it.

Dariush
  • 421
  • 3
  • 11