14

Is there a method to overlay something analogous to a density curve when the vertical axis is frequency or relative frequency? (Not an actual density function, since the area need not integrate to 1.) The following question is similar: ggplot2: histogram with normal curve, and the user self-answers with the idea to scale ..count.. inside of geom_density(). However this seems unusual.

The following code produces an overinflated "density" line.

df1            <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1             <- seq(4.5, 12, by = 0.1)
hist.1a        <- ggplot(df1, aes(v)) + 
                    stat_bin(aes(y = ..count..), color = "black", fill = "blue",
                             breaks = b1) + 
                    geom_density(aes(y = ..count..))
hist.1a

plot

Community
  • 1
  • 1
Pat W.
  • 1,641
  • 2
  • 22
  • 35

3 Answers3

29

@joran's response/comment got me thinking about what the appropriate scaling factor would be. For posterity's sake, here's the result.

When Vertical Axis is Frequency (aka Count)

density

Thus, the scaling factor for a vertical axis measured in bin counts is

bincount

In this case, with N = 164 and the bin width as 0.1, the aesthetic for y in the smoothed line should be:

y = ..density..*(164 * 0.1)

Thus the following code produces a "density" line scaled for a histogram measured in frequency (aka count).

df1            <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1             <- seq(4.5, 12, by = 0.1)
hist.1a        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..count..), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..*(164*0.1)))
hist.1a

plot

When Vertical Axis is Relative Frequency

relfreq

Using the above, we could write

hist.1b        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..count../164), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..*(0.1)))
hist.1b

relf

When Vertical Axis is Density

hist.1c        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..density..), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..))
hist.1c

dens

Pat W.
  • 1,641
  • 2
  • 22
  • 35
  • Is it possible to extract the value of this ```..density..```? – amrrs Oct 16 '17 at 14:15
  • 1
    @amrrs, See here on how to extract the histogram values. Similar hackery will get you the density (but there may be an easier way). https://stackoverflow.com/questions/7740503/getting-frequency-values-from-histogram-in-r/47137411#47137411 – PatrickT Jan 30 '18 at 17:29
  • 1
    @Pat W. This is a great answer. A very minor comment: to get the density curve without the vertical bits on the edges and the horizontal line contour, here's a way: ``geom_line(aes(y = ..density..), stat = "density", lwd=1)``, where ``lwd`` can be tweaked to thicken the line, if so desired. – PatrickT Jan 30 '18 at 17:31
  • Such a helpful answer, thank you! – bob Mar 25 '21 at 15:39
5

Try this instead:

ggplot(df1,aes(x = v)) + 
   geom_histogram(aes(y = ..ncount..)) + 
   geom_density(aes(y = ..scaled..))
joran
  • 157,274
  • 30
  • 404
  • 439
  • 1
    Would you know how to do it if we didn't want to scale the count to 1? – Pat W. Dec 22 '14 at 22:56
  • 3
    @PatW. A smooth density estimate and binned counts are not on the same scale (as you observed in your first attempt). To align them, you'll have to place them on the same scale. You can adjust that scale to be whatever you like, but some adjustment will be required. – joran Dec 22 '14 at 22:58
1
library(ggplot2)
smoothedHistogram <- function(dat, y, bins=30, xlabel = y, ...){
  gg <- ggplot(dat, aes_string(y)) + 
    geom_histogram(bins=bins, center = 0.5, stat="bin", 
                   fill = I("midnightblue"), color = "#E07102", alpha=0.8) 
  gg_build <- ggplot_build(gg)
  area <- sum(with(gg_build[["data"]][[1]], y*(xmax - xmin)))
  gg <- gg + 
    stat_density(aes(y=..density..*area), 
                 color="#BCBD22", size=2, geom="line", ...)
  gg$layers <- gg$layers[2:1]
  gg + xlab(xlabel) +  
    theme_bw() + theme(axis.title = element_text(size = 16),
                       axis.text = element_text(size = 12))
}

dat <- data.frame(x = rnorm(10000))
smoothedHistogram(dat, "x")

enter image description here

Stéphane Laurent
  • 48,421
  • 14
  • 86
  • 170