4

I want to draw the CDF plot of multiple variables in the same graph. The length of the variables are different. To simplify the detail, I use the following example code:

library("ggplot2")

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

As we can see, the a3 is 800 length, which is different with a1, a2. When I run the code, it shows:

> df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
Error in data.frame(x = c(a1, a2, a3), ggg = gl(3, 1000)) : 
arguments imply differing number of rows: 2800, 3000
> ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) +    scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Error: ggplot2 doesn't know how to deal with data of class function

So, how can I draw the cdf plots of different variables that is not the same length in the same graph using ggplot2? Looking forward for helps!

Julián Urbano
  • 8,100
  • 1
  • 28
  • 51
Excalibur
  • 411
  • 6
  • 19

2 Answers2

2

You're right in that ggplot sure does seem to want equal numbers of counts in each group. So rather than useing stat_ecdf, perhaps you could just do the calculation yourself

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = factor(rep(1:3, c(1000,1000,800))))

df <- df[order(df$x), ]
df$ecdf <- ave(df$x, df$ggg, FUN=function(x) seq_along(x)/length(x))

ggplot(df, aes(x, ecdf, colour = ggg)) + geom_line() + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Note that you were using gl() incorrectly; your code assumed all three groups had 1000 entries as well. Here i've changed it to rep() to get the right number of labels per group.

ecdf pggplot

MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • Excellent! Thanks very much, MrFlick! – Excalibur May 17 '14 at 20:21
  • And how can I set different line types for a1, a2 and a3? Such as a1 is solid, a2 is dashed, a3 is dot? – Excalibur May 17 '14 at 22:53
  • @bangliu If you have a different question, it's best to start a new question rather than asking it in the comments of an existing question. Or you could search this site for other questions about changing the linetype with ggplot. – MrFlick May 17 '14 at 23:03
2

ggplot has no trouble at all dealing with different counts in each group. The problem is with your creation of the factor ggg. Use this:

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3), ggg=factor(rep(1:3, c(1000,1000,800))))
ggplot(df, aes(x, colour = ggg)) + 
  stat_ecdf()+
  scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Also, the way you have it set up, setting xlim=c(0,3), draws the cdf on [0,3], which as you can see in the plot above is more or less a straight line.

jlhoward
  • 52,898
  • 6
  • 81
  • 125