2

I am trying to create a custom nested dict from a file read in python without collection module. My dictionary data structure is bellow.

d = {'employee': 
     {'developer1': 
      {'id1':
       {'language': ('c', 'java'),
        'worked_area':('delhi', 'kolkata')
       },
       'id2':
        {'language':('python' , 'c++'),
         'worked_area':('kolkata')
        }
       },
      'devloper2': 
      {'id1':
       {'language': ('c', 'java'),
        'worked_area':('delhi', 'kolkata')
       }
      }
     }
    }

And reading the data structure using the bellow code:

for k1, v1 in d.items():
    for k2, v2 in v1.items():
        for k3, v3 in v2.items():
            for k4, v5 in v3.items():
                print(k1, k2, k3, k4, v5)

The file : text1.txt

employee    developer1  id1 language    c
employee    developer1  id1 language    java
employee    developer1  id1 worked_area delhi
employee    developer1  id1 worked_area kolkata
employee    developer1  id2 language    python
employee    developer1  id2 language    c++
employee    developer1  id2 worked_area kolkata
employee    devloper2   id1 language    c
employee    devloper2   id1 language    java
employee    devloper2   id1 worked_area delhi
employee    devloper2   id1 worked_area kolkata

Now I am trying to create the above dictionary data structure from the above text file and print its content using the above code.

import re
d = {}
fh = open('text1.txt', 'r')
for i, line in enumerate(fh):
    line = line.strip()
    tmp = re.split(r'\t+', line)
    d[tmp[0]][tmp[1]][tmp[2]][tmp[3]].append(tmp[4])

But I am getting the bellow error while running the code

Error

KeyError: 'employee'

So need help to create the data structure code.

Tom de Geus
  • 4,312
  • 2
  • 22
  • 51
Arijit Panda
  • 1,169
  • 1
  • 11
  • 26
  • 4
    You need to initialize an empty `dict` before you can append it. This holds at each level. If you know the keys your going to encounter you can do this at the beginning of your code, otherwise I think you need a bunch of `if` statements. – Tom de Geus May 03 '17 at 09:04
  • @TomdeGeus Can you please show me an example for it . – Arijit Panda May 03 '17 at 09:07
  • 2
    Or perhaps you could use `defaultdict` – agamagarwal May 03 '17 at 09:10
  • [help on nested dict](http://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionaries) and [help2](http://stackoverflow.com/questions/651794/whats-the-best-way-to-initialize-a-dict-of-dicts-in-python) – luoluo May 03 '17 at 09:10

2 Answers2

1

Your problem is that you initialize an empty dict. There's no employee key, so you get the KeyError:

>>> d = {}
>>> d['employee']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'employee'

The next problem would be that the value corresponding to employee key should be itself a dict, and so on. To solve this problem, you could use nested defaultdicts.

Since the nested depth is constant and known, you just need to initialize a tree. It is a defaultdict of a defaultdict of a defaultdict of a defaultdict of a list :)

Once this tree is initialized, it's very easy to append information to the leaves. Note that you should use a list instead of a tuple : the length of languages isn't known until the end, and you cannot append values to a tuple.

data = """employee    developer1  id1 language    c
employee    developer1  id1 language    java
employee    developer1  id1 worked_area delhi
employee    developer1  id1 worked_area kolkata
employee    developer1  id2 language    python
employee    developer1  id2 language    c++
employee    developer1  id2 worked_area kolkata
employee    devloper2   id1 language    c
employee    devloper2   id1 language    java
employee    devloper2   id1 worked_area delhi
employee    devloper2   id1 worked_area kolkata"""

from collections import defaultdict

tree = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(list))))

for line in data.splitlines():
    k1, k2, k3, k4, v = line.split()
    tree[k1][k2][k3][k4].append(v)

print(tree)
# defaultdict(<function <lambda> at 0x7f2e771cd7d0>, {'employee': defaultdict(<function <lambda> at 0x7f2e771cdf50>, {'developer1': defaultdict(<function <lambda> at 0x7f2e771cf050>, {'id2': defaultdict(<type 'list'>, {'worked_area': ['kolkata'], 'language': ['python', 'c++']}), 'id1': defaultdict(<type 'list'>, {'worked_area': ['delhi', 'kolkata'], 'language': ['c', 'java']})}), 'devloper2': defaultdict(<function <lambda> at 0x7f2e771cf0c8>, {'id1': defaultdict(<type 'list'>, {'worked_area': ['delhi', 'kolkata'], 'language': ['c', 'java']})})})})

print(tree['employee']['developer1']['id2']['language'])
# ['python', 'c++']

print(tree['employee']['developerX']['idX']['language'])
# []

To see the tree's structure, you can use json.dumps:

import json
print(json.dumps(tree, indent=4))

It outputs:

{
    "employee": {
        "developer1": {
            "id1": {
                "language": [
                    "c",
                    "java"
                ],
                "worked_area": [
                    "delhi",
                    "kolkata"
                ]
            },
            "id2": {
                "language": [
                    "python",
                    "c++"
                ],
                "worked_area": [
                    "kolkata"
                ]
            }
        },
        "devloper2": {
            "id1": {
                "language": [
                    "c",
                    "java"
                ],
                "worked_area": [
                    "delhi",
                    "kolkata"
                ]
            }
        }
    }
}

Since a defaultdict is also a dict, you can iterate over the values just like you proposed.

Eric Duminil
  • 48,038
  • 8
  • 56
  • 100
0

On request:

Just with the built-in dict you could do:

import re
d = {}
fh = open('text1.txt', 'r')
for i, line in enumerate(fh):
    line = line.strip()
    tmp = re.split(r'\t+', line)
    if tmp[0] not in d:
        d[tmp[0]] = {}
    if tmp[1] not in d[tmp[0]]:
        d[tmp[0]][tmp[1]] = {}
    if tmp[2] not in d[tmp[0]][tmp[1]]:
        d[tmp[0]][tmp[1]][tmp[2]] = {}
    if tmp[3] not in d[tmp[0]][tmp[1]][tmp[2]]:
        d[tmp[0]][tmp[1]][tmp[2]][tmp[3]] = []
    d[tmp[0]][tmp[1]][tmp[2]][tmp[3]].append(tmp[4])

With some more thought probably a more elegant solution could be accomplished. People must have thought about this before. For example people working with JSON files.

Tom de Geus
  • 4,312
  • 2
  • 22
  • 51