31

I have data as a list of floats and I want to plot it as a histogram. Hist() function does the job perfectly for plotting the absolute histogram. However, I cannot figure out how to represent it in a relative frequency format - I would like to have it as a fraction or ideally as a percentage on the y-axis.

Here is the code:

fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, patches = ax.hist(mydata, bins=100, normed=1, cumulative=0)
ax.set_xlabel('Bins', size=20)
ax.set_ylabel('Frequency', size=20)
ax.legend

plt.show()

I thought normed=1 argument would do it, but it gives fractions that are too high and sometimes are greater than 1. They also seem to depend on the bin size, as if they are not normalized by the bin size or something. Nevertheless, when I set cumulative=1, it nicely sums up to 1. So, where is the catch? By the way, when I feed the same data into Origin and plot it, it gives me perfectly correct fractions. Thank you!

easwee
  • 14,813
  • 24
  • 53
  • 80
user1278140
  • 313
  • 1
  • 3
  • 5
  • Why do you say _"list"_ in quotes, is there something special about your data and how you are storing it? You have also called your data ``Data``, which is odd, as Python naming conventions state that ``CamelCase`` be reserved for class names - see PEP 8 http://www.python.org/dev/peps/pep-0008/. – Gareth Latty Mar 19 '12 at 08:59
  • Sorry for misleading. I was just not sure about the convention of stating datatypes and arguments and so on. So, I've edited the original post to remove all quotation marks. This is just a peace of the whole code and for simplification I renamed the variables just for posting it here. In the original code they have longer names meaningful to me but irrelevant to the question as the rest of the code works just fine. I renamed it data to mydata now. – user1278140 Mar 19 '12 at 10:07
  • No worries, just letting you know. Cheers for improving the question, makes it better for everyone. – Gareth Latty Mar 19 '12 at 11:11
  • 1
    `normed` is deprecated. You can use `density` instead. It makes the integral (NOT the sum) equal 1. – root May 23 '18 at 18:33

3 Answers3

50

Because normed option of hist returns the density of points, e.g dN/dx

What you need is something like that:

 # assuming that mydata is an numpy array
 ax.hist(mydata, weights=np.zeros_like(mydata) + 1. / mydata.size)
 # this will give you fractions
sega_sai
  • 7,309
  • 1
  • 26
  • 36
5

Or you can use set_major_formatter to adjust the scale of the y-axis, as follows:

from matplotlib import ticker as tick

def adjust_y_axis(x, pos):
    return x / (len(mydata) * 1.0)

ax.yaxis.set_major_formatter(tick.FuncFormatter(adjust_y_axis))

just call adjust_y_axis as above before plt.show().

pault
  • 32,557
  • 9
  • 66
  • 110
fraxel
  • 31,038
  • 11
  • 87
  • 96
0

For relative frequency format set the option density=True. The figure below shows a histogram for 1000 samples taken from a normal distribution with mean 5 and standard deviation 2.0.

Histogram generated with matplotlib

The code is

import numpy as np
import matplotlib.pyplot as plt

# Generate data from normal distibution
mu, sigma = 5, 2.0 # mean and standard deviation
mydata = np.random.normal(mu, sigma, 1000)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(mydata,bins=100,density=True);
plt.show()

If you want % on the y-axis you can use PercentFormatter as shown below

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

# Generate data from normal distibution
mu, sigma = 5, 2.0 # mean and standard deviation
mydata = np.random.normal(mu, sigma, 1000)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(mydata,bins=100,density=False);
ax.yaxis.set_major_formatter(PercentFormatter(xmax=100))
plt.show()

enter image description here

John
  • 940
  • 2
  • 9
  • 19
  • The answer is not right. Setting `density` on true means just that the integral over the histogram is one, but not that all bins added up result in 100%. – Marek Apr 27 '21 at 10:16