1

I'm new to using dictionaries and I've been looking how to do this but I cannot find the answer to this specific problem.

I have a 4-level nested dictionary and inside the last level I have the values (arrays) of interest. It looks like this:

import numpy as np

A = np.array([1,2,3])
B = np.array([4,5,6])
C = np.array([7,8,9])
D = np.array([10,11,12])
E = np.array([13,14,15])

d={('domestic','dog','collie','old'):A,
   ('domestic','dog','golden','old'):B,
   ('domestic','dog','golden','young'):C,
   ('domestic','cat','siamese','young'):D,
   ('stray','dog','golden','old'):E}

What I need to do is operate over all arrays that satisfy certain condition(s) on a specific level.

For instance, I need the averages of all the arrays that have the word 'dog' in its second level, no matter if it's domestic or stray, or if it's old or young, etc.

And what if I needed to satisfy several conditions at once? For instance, average over all dogs that are young.

Any help is appreciated!

Edit: The reason why I was not using Pandas is because my arrays have 2 dimensions, and I'm looking how to operate over every (x,y) for each "key conditions". I realize now with some answers/comments how my title question is not clear, and how the example I provided is not showing what I was really intending to do. I'm sorry for that, I should learn not to post after a long day of work.

In Pandas I've always used averages over all values, but since what I need to do here is get an array of averages according to some conditions, I thought this couldn't be done using Pandas, so after some research I thought the best idea was to start using dictionaries to store the data.

In my example, what I would need to obtain is an array (x0,y0,z0) of averages. For instance, if I want the average over all "dogs" & "golden", the result should be

[ (B[0]+C[0])/2, (B[1]+C[1])/2, (B[2]+C[2])/2 ]

Is this possible to achieve using Pandas?

lanadaquenada
  • 366
  • 2
  • 4
  • 18
  • 1
    Welcome to SO. Unfortunately this isn't a discussion forum or tutorial. Please take the time to read [ask] and the other links found on that page. Invest some time with [the Tutorial](https://docs.python.org/3/tutorial/index.html) practicing the examples. It will give you an idea of the tools Python offers to help you solve your problem. [“Can someone help me?” not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question). – wwii Apr 19 '18 at 21:21
  • This sounds like a job for pandas. – Alex Hall Apr 19 '18 at 21:34
  • @wwii I have read the "how to ask" before and I have used tutorials before. The question was not "help me" but rather "How can I solve this specific problem using dictionaries". I'm sorry that my English is not good enough for me to ask the question without using an example (which btw, it is not MY example but rather a very simplified version of it). I also DID research how to do this before posting, but honestly, if I have to start listing all the things that I've tried and didn't work for this reason or the other, it will be a never ending, most likely confusing, post. – lanadaquenada Apr 20 '18 at 15:58
  • The question is too broad (imho) as @AlexHall suggested this could be accomplished with [Pandas](http://pandas.pydata.org/) and your example data would fit neatly into many of the examples in documentation. Then again there are a number of ways to filter items in a dictionary using loops and conditional statements then operate on the resultant item values. *cont'd* – wwii Apr 20 '18 at 16:10
  • *cont'd... The [in operator](https://docs.python.org/3/reference/expressions.html#comparisons) and possibly [operator.itemgetter](https://docs.python.org/3/library/operator.html#operator.itemgetter) come to mind. techniques for looping and comparing values are found in the Tutorial. – wwii Apr 20 '18 at 16:11
  • @wwii thank you for your input, I added an edit to my question to clarify my very poor original question. – lanadaquenada Apr 20 '18 at 16:39

3 Answers3

1

What you have there is not a nested dictionary, but simply a dictionary consisting of keys which are tuples of 4 values. A nested dictionary would be more like d={'a':{'b':{'c':{...}}}}. So you can get the key of the dictionary simply by iterating over it or using d.keys(). For example, if you want to average over all arrays that have the word "dog" in the second position of the tuple:

list = []
for key in d:
    if key[1] == 'dog':
        list.append(d[key])
average = np.mean(list)

It can be done much more succinctly with list comprehension:

average = np.mean([d[key] for key in d if key[1]=='dog'])

For this question, I've assumed you want the full average over all elements of all arrays and that the arrays are all the same shape.

enumaris
  • 1,588
  • 1
  • 12
  • 29
  • Although this does work perfectly in my example, for some reason it does not work with my data. I think it might be related with the fact that I constructed a dictionary using the class `Vividict` given by AaronHall in [this answer](https://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionariesl) – lanadaquenada Apr 20 '18 at 16:44
1

One way without pandas is to create a function which does this for you.

For large datasets, this is advisable only for isolated calls. For groups of calculations, pandas is a better option.

import numpy as np

A = np.array([1,2,3])
B = np.array([4,5,6])
C = np.array([7,8,9])
D = np.array([10,11,12])
E = np.array([13,14,15])

d = {('domestic','dog','collie','old'):A,
     ('domestic','dog','golden','old'):B,
     ('domestic','dog','golden','young'):C,
     ('domestic','cat','siamese','young'):D,
     ('stray','dog','golden','old'):E}

def averager(criteria, d):

    def apply_criteria(k, criteria):
        for i, j in criteria.items():
            if k[i] != j:
                return False
        else:
            return True

    return np.mean([v for k, v in d.items() if apply_criteria(k, criteria)], axis=0)

res = averager({0: 'domestic', 1: 'dog'}, d)

# array([ 4.,  5.,  6.])

Explanation

  • The criteria are supplied to the averager function via a dictionary of {index: value} items.
  • We use a list a comprehension to extract relevant numpy array values.
  • Use numpy.mean with axis=0 to calculate mean by index across arrays.
jpp
  • 134,728
  • 29
  • 196
  • 240
  • This works perfectly, and it is very intuitive. Following all your suggestions, I tried using Pandas. However, when I transform my dictionary using `df = pd.DataFrame(d)` I get the following error: `ValueError: If using all scalar values, you must pass an index.` After a quick search, I couldn't find a specific solution for this. So since your code is working, I'll stuck with it. Thanks! – lanadaquenada Apr 24 '18 at 15:22
  • Try `pd.DataFrame.from_dict(d, orient='index')`. Otherwise this may be another question :). – jpp Apr 24 '18 at 15:28
  • Sadly, `ValueError: Must pass 2-d input`. But thank you though! I'll consider making another question if I keep having the problem later on. Thanks again! :) – lanadaquenada Apr 24 '18 at 15:34
1

Without using Pandas

>>> from pprint import pprint
>>> import numpy as np
>>> pprint(d)
{('domestic', 'cat', 'siamese', 'young'): array([10, 11, 12]),
 ('domestic', 'dog', 'collie', 'old'): array([1, 2, 3]),
 ('domestic', 'dog', 'golden', 'old'): array([4, 5, 6]),
 ('domestic', 'dog', 'golden', 'young'): array([7, 8, 9]),
 ('stray', 'dog', 'golden', 'old'): array([13, 14, 15])}

Filter the dictionary

>>> keys = ('old','dog')
>>> q = [v for k,v in d.items() if all(thing in k for thing in keys)]
>>> q
[array([1, 2, 3]), array([4, 5, 6]), array([13, 14, 15])]
>>>
>>> #or with keys as a set
>>> keys = set(('old','dog'))
>>> q = [v for k,v in d.items() if len(keys.intersection(k)) == len(keys)]

Create a 2-d array from the results and get the mean of the columns:

>>> np.vstack(q)
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [13, 14, 15]])
>>> np.vstack(q).mean(1)
array([  2.,   5.,  14.])
>>> np.vstack(q).mean(0)
array([ 6.,  7.,  8.])
>>>

Using the in operator, this solution does Not test for values in specific positions of the dictionary keys.

wwii
  • 19,802
  • 6
  • 32
  • 69