0

I'm trying to implement a nested dictionary structure in a specific manner. I'm reading in a long list of words. These words are eventually going to need to be searched through often and efficiently so this is how I want my dictionary to be set up:

I'm trying to make a nested dictionary structure where the first key value is the length of the word, the value is a dict with the key being the first letter of the word and the value is a dict with the key being the second letter of the word and the value being a dict with the key as third letter of the word etc..

so if I read in "car" "can" and "joe"

I get

{3: {c: {a: {r: car, n: can}}},j: {o: {e: joe}}}

I need to do this for about 100,000 words though and they vary in length from 2 to 27 letters.

I've looked through What is the best way to implement nested dictionaries? and Dynamic nested dictionaries.

but haven't had any luck figuring this out.

I can certainly get my words out of my text file using

for word in text_file.read().split()

and I can break into each character using

for char in word

or

for i in range(len(word)):
    word[i]

I just can't figure out how to get this structure down. Any help would be greatly appreciated.

Community
  • 1
  • 1
J. Smith
  • 123
  • 1
  • 2
  • 11
  • Shouldn't the length be 3 in your example? And what's the key for the joe dict then? – Iluvatar Dec 07 '16 at 01:01
  • @Iluvatar Yes! edited to show 3, thank you. The key for the Joe dict is also 3 because it is a 3-letter word. – J. Smith Dec 07 '16 at 01:05
  • You can't have two keys that are the same thing. Also, what is your goal with this? Something like word suggestion? – Iluvatar Dec 07 '16 at 01:07
  • @Iluvatar I don't have 2 keys that are the same thing, but there are two dictionaries that are the values for the key 3. For example: `{3:a,b}` where a and b are also dictionaries. – J. Smith Dec 07 '16 at 01:08
  • If you want two dicts to be the value for the key, you'll need to stick them in a list or something. – Iluvatar Dec 07 '16 at 01:09
  • @Iluvatar oh ok, so I should put them in a list? I can edit it to show that, I didn't realize. Thank you – J. Smith Dec 07 '16 at 01:10
  • It would still be helpful to know what exactly you want to do with this. – Iluvatar Dec 07 '16 at 01:10
  • 1
    Strangely coincidental https://stackoverflow.com/questions/41007669/how-would-i-recursively-create-dictionaries-inside-of-dictionaries Is this a code challenge? – Josh J Dec 07 '16 at 01:11
  • @JoshJ Class project... – J. Smith Dec 07 '16 at 01:12
  • @Iluvatar I am going to use this structure to determine if I can make a word from certain letters after reading in a screenshot from the "wordbrain" game on the App store (iTunes and Google Play). – J. Smith Dec 07 '16 at 01:14
  • That sounds like something to implement with character counts honestly. – Iluvatar Dec 07 '16 at 01:15
  • btw your structure should probably be something more like `{3: {c: {a: {r:car, n:can}}, j: {o: {e:joe}}}}` if you end up doing it that way. – Iluvatar Dec 07 '16 at 01:17
  • @Iluvatar you said above that the dicts would need to be in a list to be used as values – J. Smith Dec 07 '16 at 01:18
  • You can't have multiple separate dicts tied to one key. But instead of having a separate dict for each letter, just put all the next letters in a single dict. Which is the way you'd want it anyway for any kind of lookup. (Count the brackets, 3 is only tied to a single dict.) – Iluvatar Dec 07 '16 at 01:19
  • @Iluvatar Yes! That's what I'd like to do. Any suggestions? – J. Smith Dec 07 '16 at 01:23
  • Why not use [Trie](https://en.wikipedia.org/wiki/Trie) and limit the search depth based on word length? – niemmi Dec 07 '16 at 01:23
  • @niemmi That is the structure I am trying to create – J. Smith Dec 07 '16 at 01:26
  • In that case why group the words based on their length and why store the whole word as a value on lowest level? You basically need a tree node to store all the children and mark if any word terminates there or not. – niemmi Dec 07 '16 at 01:28
  • *cough* http://stackoverflow.com/questions/11015320/how-to-create-a-trie-in-python *cough* – Iluvatar Dec 07 '16 at 01:29

3 Answers3

3

Here's a short example on how to implement trie with autovivification built on defaultdict. For every node that terminates a word it stores extra key term to indicate it.

from collections import defaultdict

trie = lambda: defaultdict(trie)

def add_word(root, s):
    node = root
    for c in s:
        node = node[c]
    node['term'] = True

def list_words(root, length, prefix=''):
    if not length:
        if 'term' in root:
            yield prefix
        return

    for k, v in root.items(): 
        if k != 'term':
            yield from list_words(v, length - 1, prefix + k)

WORDS = ['cars', 'car', 'can', 'joe']
root = trie()
for word in WORDS:
    add_word(root, word)

print('Length {}'.format(3))
print('\n'.join(list_words(root, 3)))
print('Length {}'.format(4))
print('\n'.join(list_words(root, 4)))

Output:

Length 3
joe
can
car
Length 4
cars
niemmi
  • 16,078
  • 7
  • 28
  • 35
2

Not being sure what your purpose of this structure is, here's a solution using recursion to generate the structure that you describe:

from collections import defaultdict
d = defaultdict(list)
words = ['hello', 'world', 'hi']


def nest(d, word):
    if word == "":
        return d
    d = {word[-1:]: word if d is None else d}
    return nest(d, word[:-1])


for word in words:
    l = len(word)
    d[l].append(nest(None, word))

print(d)
antonagestam
  • 3,718
  • 2
  • 28
  • 39
  • This builds a dictionary of lists of dictionaries, not just nested dictionaries—which is what the OP wants. – martineau Dec 07 '16 at 02:43
  • @martineau The question was edited, there were lists in the question at one point. – antonagestam Dec 07 '16 at 02:46
  • Perhaps then you should [edit] your answer and indicate you're answering an earlier version of the question (and show the dictionary/output produced). – martineau Dec 07 '16 at 09:31
1

Here's a way to do it without using collections.defaultdict or creating your own custom subclass of dict—so the resulting dictionary is just a ordinary dict object:

import pprint

def _build_dict(wholeword, chars, val, dic):
    if len(chars) == 1:
        dic[chars[0]] = wholeword
        return
    new_dict = dic.get(chars[0], {})
    dic[chars[0]] = new_dict
    _build_dict(wholeword, chars[1:], val, new_dict)

def build_dict(words):
    dic = {}
    for word in words:
        root = dic.setdefault(len(word), {})
        _build_dict(word, list(word), word[1:], root)
    return dic

words = ['a', 'ox', 'car', 'can', 'joe']
data_dict = build_dict(words)
pprint.pprint(data_dict)

Output:

{1: {'a': 'a'},
 2: {'o': {'x': 'ox'}},
 3: {'c': {'a': {'n': 'can', 'r': 'car'}}, 'j': {'o': {'e': 'joe'}}}}

It's based on a recursive algorithm illustrated in a message in a python.org Python-list Archives thread titled Building and Transvering multi-level dictionaries.

martineau
  • 99,260
  • 22
  • 139
  • 249