0

I have an array of numbers:

q1a = [1,2,2,2,4,3,1,3,3,4,0,0]

I want to save these in an array where it will be stored in as (number, proportion of the number) using PYTHON.

Such as : [[0 0.1667], [1 0.1667], [2 0.25], [3 0.25], [4 0.167]].

This is essential to calculate the distribution of the numbers. How can I do this?

Although I wrote the code to save the numbers as : (number, number of times it occurred in the list) but I cant figure it out how I can find the proportion of each number. Thanks.

sorted_sample_values_of_x = unique, counts = np.unique(q1a, return_counts=True)
np.asarray((unique, counts)).T
np.put(q1a, [0], [0])

sorted_x = np.matrix(sorted_sample_values_of_x)
sorted_x = np.transpose(sorted_x)
print('\n' 'Values of x (sorted):' '\n')
print(sorted_x)
jhon_wick
  • 177
  • 2
  • 9
  • possible duplicate of [item frequency count in python](http://stackoverflow.com/questions/893417/item-frequency-count-in-python) – maxymoo Jul 21 '15 at 04:03

6 Answers6

1
>>> q1a = [1,2,2,2,4,3,1,3,3,4,0,0]
>>> from collections import Counter
>>> sorted([[x, float(y)/len(q1a)] for (x, y) in Counter(q1a).items()],
...        key=lambda x: x[0])
[[0, 0.16666666666666666],
 [1, 0.16666666666666666],
 [2, 0.25],
 [3, 0.25],
 [4, 0.16666666666666666]]
Chris Martin
  • 28,558
  • 6
  • 66
  • 126
1

You will need to do two things.

  1. Convert sorted_x array as a float array.

  2. And then divide it by sum of counts array.

Example -

In [34]: sorted_x = np.matrix(sorted_sample_values_of_x)

In [35]: sorted_x = np.transpose(sorted_x).astype(float)

In [36]: sorted_x
Out[36]:
matrix([[ 0.,  2.],
        [ 1.,  2.],
        [ 2.,  3.],
        [ 3.,  3.],
        [ 4.,  2.]])

In [37]: sorted_x[:,1] = sorted_x[:,1]/counts.sum()

In [38]: sorted_x
Out[38]:
matrix([[ 0.        ,  0.16666667],
        [ 1.        ,  0.16666667],
        [ 2.        ,  0.25      ],
        [ 3.        ,  0.25      ],
        [ 4.        ,  0.16666667]])

To store the numbers with the propertions in a new array, do -

In [41]: sorted_x = np.matrix(sorted_sample_values_of_x)

In [42]: sorted_x = np.transpose(sorted_x).astype(float)

In [43]: ns = sorted_x/np.array([1,counts.sum()])

In [44]: ns
Out[44]:
matrix([[ 0.        ,  0.16666667],
        [ 1.        ,  0.16666667],
        [ 2.        ,  0.25      ],
        [ 3.        ,  0.25      ],
        [ 4.        ,  0.16666667]])
Anand S Kumar
  • 76,986
  • 16
  • 159
  • 156
0
In [12]: from collections import Counter

In [13]: a = [1,2,2,2,4,3,1,3,3,4,0,0]

In [14]: counter = Counter(a)

In [15]: sorted( [ [key, float(counter[key])/len(a)]  for key in counter ] )
Out[15]:
[[0, 0.16666666666666666],
 [1, 0.16666666666666666],
 [2, 0.25],
 [3, 0.25],
 [4, 0.16666666666666666]]
Sait
  • 16,365
  • 16
  • 65
  • 96
0
#!/usr/bin/env python
import numpy as np
q1a = [1,2,2,2,4,3,1,3,3,4,0,0]

unique, counts = np.unique(q1a, return_counts=True)
counts = counts.astype(float) # convert to float
counts /= counts.sum()        # counts -> proportion
print(np.c_[unique, counts])

Output

[[ 0.          0.16666667]
 [ 1.          0.16666667]
 [ 2.          0.25      ]
 [ 3.          0.25      ]
 [ 4.          0.16666667]]
jfs
  • 346,887
  • 152
  • 868
  • 1,518
0

As an alternative to collections.Counter, try collections.defaultdict. This allows you to accumulate the total frequency as you proceed through the input (i.e should be more efficient) and it's more readable (IMO).

from collections import defaultdict

q1a = [1,2,2,2,4,3,1,3,3,4,0,0]
n = float(len(q1a))
frequencies = defaultdict(int)
for i in q1a:
    frequencies[i] += 1/n

print frequencies.items()
[(0, 0.16666666666666666), (1, 0.16666666666666666), (2, 0.25), (3, 0.25), (4, 0.16666666666666666)]
mhawke
  • 75,264
  • 8
  • 92
  • 125
0

An fun alternative using numpy

print [(val, 1.*np.sum(q1a==val)/len(q1a) ) for val in np.unique(q1a) ]
#[(0, 0.16666666666666666),
#(1, 0.16666666666666666),
#(2, 0.25),
#(3, 0.25),
#(4, 0.16666666666666666)]

The 1. is to force float division

dermen
  • 4,129
  • 2
  • 21
  • 31