1

I'm trying to read in data from and create a nested dictionary of dictionaries. There is a similar question here, but I can't seem to figure out how to adapt a solution to my particular problem. I would be very grateful if someone could explain a solution to me for my problem.

Basically, I have a file that looks like this:

A    'abc'    12    0.001
B    'tex'    34    0.002  
B    'tex'    78    0.005
E    'yet'    88    0.090
A    'abc'    22    0.120

I need to create a complex dictionary that looks like this:

complete_dict = {A:{'abc':[[12, 0.001], [22, 0.120]]}, 
                 B:{'tex':[[34, 0.002], [78, 0.005]]}, 
                 E:{'yet':[[88, 0.090]]}}

I can create the inner dictionary, but I can't figure out how to create the outer dictionary. Here is my code for the inner dictionary:

with open('data.txt', mode="r") as data_file:
    fieldnames = ('character', 'string', 'value1', 'value2')
    reader = csv.DictReader(data_file, fieldnames=fieldnames, delimiter="\t")
    inner_dict = {}
    for row in reader:
        values = [int(row['value1']), float(row['value2'])] 
        string = row['string'] 
        if string in inner_dict:
            inner_dict[string].append(values)
        else:
            inner_dict[string] = values

Could someone explain how to create the outer dictionary? The only idea I have is to read the file and create the inner dictionary, then reread the file to create the outer dictionary. Surely there must be an easier way? Thanks in advance for the help!

Community
  • 1
  • 1
drbunsen
  • 8,781
  • 20
  • 62
  • 91

6 Answers6

6

Is this what you're looking to accomplish?

with open('data.txt', mode="r") as data_file:
    fieldnames = ('character', 'string', 'value1', 'value2')
    reader = csv.DictReader(data_file, fieldnames=fieldnames, delimiter="\t")

    complete_dict = {}
    for row in reader:
        char_dict = complete_dict.setdefault(row['character'], {})
        values_list = char_dict.setdefault(row['string'], [])
        values = [int(row['value1']), float(row['value2'])] 
        values_list.append(values)

pprint.pprint(complete_dict)

Note that in your example you have 'value2' where you want 'value1'. Also, this appears to include the single quotes around the strings as part of the string, so you may need to clean that up.

retracile
  • 11,220
  • 2
  • 33
  • 42
2

Given:

$ cat data.txt
A   'abc'   12  0.001
B   'tex'   34  0.002
B   'tex'   78  0.005
E   'yet'   88  0.090
A   'abc'   22  0.120

This:

import csv

d={}
with open('data.txt', mode="r") as data_file:
    fieldnames = ('character', 'string', 'value1', 'value2')
    reader = csv.DictReader(data_file, fieldnames=fieldnames, delimiter="\t")
    for row in reader:
        c=row['character']
        values = [int(row['value1']), float(row['value2'])] 
        s = row['string']
        if c not in d: d[c]={}
        if s not in d[c]: d[c][s] = []
        d[c][s].append(values)

print d        

Produces:

{'A': {"'abc'": [[12, 0.001], [22, 0.12]]}, 
 'B': {"'tex'": [[34, 0.002], [78, 0.005]]}, 
 'E': {"'yet'": [[88, 0.09]]}}
the wolf
  • 29,808
  • 12
  • 50
  • 71
2

Use a defaultdict.

from collections import defaultdict
complete_dict = defaultdict(lambda: defaultdict(list))

with open('data.txt', mode="rb") as data_file:
    reader = csv.reader(data_file, delimiter="\t")
    for c, s, v1, v in reader:
        complete_dict[c][s].append([v1, v2])
Steven Rumbalski
  • 39,949
  • 7
  • 78
  • 111
0

using setdefault:

with open('data.txt', mode="r") as data_file:
    fieldnames = ('character', 'string', 'value1', 'value2')
    reader = csv.DictReader(data_file, fieldnames=fieldnames, delimiter="\t")

    result = {}
    for row in reader:
        result.setdefault(row['character'], {}).setdefault(row['string'], []).append([int(row['value1']), float(row['value2'])])

print(result)
0

If you read the file in a variable called s for brevity, the following might work:

d = {}
for l in s.split('\n'):
    character, string, val1, val2 = l.split('\t')
    if not d.has_key(character):
        d[character] = { string: [] }
    d[character][string].append([val1, val2])

Assuming string is always the same for every character, but that wasn't explicitly specified in your question.

jro
  • 8,540
  • 27
  • 33
0

Here's how I would do it. Not much shorter than yours. This way only keeps one copy of all the data in memory, only reading in one line at a time from the file.

f = open('data.txt', 'r')
rows = imap(lambda line: line.split('\t'), f)
result = {}
for key1, key2, val1, val2 in rows:
  key2 = eval(key2)  # safe only if you know the value is a quoted string
  if key1 not in result:
    result[key1] = {}
  if key2 not in result[key1]:
    result[key1][key2] = []
  result[key1][key2].append([int(val1), float(val2)])
f.close()  # prevent lingering open file
wberry
  • 16,119
  • 7
  • 48
  • 79