2

I'm approximating a distribution with gaussian mixtures and was wondering whether there was an easy way to automatically plot the estimated kernel density of the whole (uni-dimensional) dataset as the sum of the component densities in a nice fashion like this using ggplot2:

Full data density plotted with single component densities

Given the following example data, my approach in ggplot2 would be to manually plot the subset densities into the scaled overall density like this:

#example data
a<-rnorm(1000,0,1) #component 1
b<-rnorm(1000,5,2) #component 2
d<-c(a,b) #overall data 
df<-data.frame(d,id=rep(c(1,2),each=1000)) #add group id

##ggplot2
require(ggplot2)

ggplot(df) +
  geom_density(aes(x=d,y=..scaled..)) +
  geom_density(data=subset(df,id==1), aes(x=d), lty=2) +
  geom_density(data=subset(df,id==2), aes(x=d), lty=4)

ggplot2 Plot

Note that this does not work out regarding the scales. It also does not work when you scale all 3 densities or no density at all. So I was not able to replicate above plot.

In addition, I am not able to automatically generate this plot without having to subset manually. I tried using position = "stacked" as parameter in geom_density.

I usually have around 5-6 Components per dataset, so manually subsetting would be possible. However, I would like to have different colors or line-types per component density which are displayed in the legend of ggplot, so doing all subsets manually would increase the workload quite a bit.

Any ideas? Thanks!

Mr. Z
  • 1,321
  • 9
  • 24

1 Answers1

2

Here is a possible solution by specifying each density in the aes call with position = "identity" in one layer and in the second layer using stacked density without the legend.

ggplot(df) +
  stat_density(aes(x = d,  linetype = as.factor(id)), position = "stack", geom = "line", show.legend = F, color = "red") +
  stat_density(aes(x = d,  linetype = as.factor(id)), position = "identity", geom = "line")

enter image description here

Do note that when using more then two groups:

  a <- rnorm(1000, 0, 1) 
  b <- rnorm(1000, 5, 2) 
  c <- rnorm(1000, 3, 2)
  d <- rnorm(1000, -2, 1)
  d <- c(a, b, c, d)
  df <- data.frame(d, id = as.factor(rep(c(1, 2, 3, 4), each = 1000))) 

curves for each stack appear (this is a problem with the two group example but linetype in first layer disguised it - use group instead to check) :

 gplot(df) +
    stat_density(aes(x = d, group = id), position = "stack", geom = "line", show.legend = F, color = "red") +
    stat_density(aes(x = d, linetype = id), position = "identity", geom = "line")

enter image description here

A relatively easy fix to this is to add alpha mapping and manually set it to 0 for unwanted curves:

  ggplot(df) +
    stat_density(aes(x=d, alpha = id), position = "stack", geom = "line", show.legend = F, color = "red") +
    stat_density(aes(x=d,  linetype = id), position = "identity", geom = "line")+
    scale_alpha_manual(values = c(1,0,0,0))

enter image description here

missuse
  • 16,776
  • 3
  • 15
  • 33