1

Background:

Not sure if I have all my terminology right, so I apologize if this happens to be a duplicate question (similar question 1, similar question 2). I've been reading this tutorial How to Iterate Through a Dictionary in Python and I guess what I want to do is something along the lines of "Doing Some Calculations: Revisited", but in the form of "tuple unpacking" (words used in the 3rd link).

Problem/Goal:

What I was hoping for is to create a new dictionary with the original key, a new value that is the mean of the old value's list and plot it.

My Attempt:

Below is full attempt as a for loop and also my attempt at making a "one liner". The closest I got was forming two different variables that take on the dictionaries keys and another variable that takes on the values and plot them as a (x,y).

k_to_accuracies = {1: [0.274, 0.274, 0.274, 0.274, 0.274], 
                   2: [0.224, 0.224, 0.224, 0.224, 0.224], 
                   3: [0.272, 0.272, 0.272, 0.272, 0.272], 
                   5: [0.278, 0.278, 0.278, 0.278, 0.278], 
                   7: [0.274, 0.274, 0.274, 0.274, 0.274], 
                   10: [0.282, 0.282, 0.282, 0.282, 0.282], 
                   15: [0.272, 0.272, 0.272, 0.272, 0.272], 
                   20: [0.272, 0.272, 0.272, 0.272, 0.272], 
                   25: [0.274, 0.274, 0.274, 0.274, 0.274], 
                   30: [0.254, 0.254, 0.254, 0.254, 0.254]}
k_ave = {}
for key, value in k_to_accuracies.items():
    #print(key, '->', value)
    k_ave[key] = np.mean(value)
    print(k_ave)

k_ave = {}
k_ave = [np.mean(value) for value in k_to_accuracies.values()]
print("\n",k_ave)

k_keys = [key for key in k_to_accuracies.keys()]
print("\n",k_keys)

plt.plot(k_keys, k_ave, '.')
plt.show()

Questions

  1. If possible how would I write this as one line or what is the most efficient/fastest way to do this.

  2. Also would it be correct to call this a vectorized/broadcast calculation? If it is possible can someone explain how I would vectorize/broadcast these lines of code? (also not sure if this is correct terminology or even applicable in this scenario). I have yet to find a solid tutorial on these concepts besides the standard scipy tutorial and also tutorialspoint.

Andras Deak
  • 27,857
  • 8
  • 66
  • 96
CLDuser2.-
  • 43
  • 4

1 Answers1

1

This is possible to write in one line, but I wouldn't recommend it:

>>> plt.plot(*zip(*{k: np.mean(v) for k, v in k_to_accuracies.items()}.items()), '.')

As you can see, this is rather opaque, and, while it produces the correct input, the versions in your question are far easier to read and understand. In terms of time comparison, there is virtually no difference between this approach, and the approach in your question:

k_ave = [np.mean(value) for value in k_to_accuracies.values()]
k_keys = [key for key in k_to_accuracies.keys()]
plt.plot(k_keys, k_ave, '.')

Neither of these is a vertorised or broadcast calculation. These terms refer to where batch operations can be performed on data without the use of for loops, and are common in C extensions to Python such as operations performed on Numpy's arrays and Pandas' dataframes. As our data structure here is a dictionary, we can't apply vectorisation here without converting to one of these structures.

The approach we have taken here is the use of list comprehensions and generator expressions, which are basically a way of minifying and combining for-loops and lambda functions. You can read more about these here. Note that the difference between these and broadcast operations/vectorisation is that list comprehensions iterate through a structure and apply operations on one piece of data at a time.

CDJB
  • 12,538
  • 5
  • 20
  • 42