3

I recently had to graph some data based on an interaction of factors and I found it more difficult than I felt something this common should be in R. I suspect I'm missing something. Let's say I have a vector of 30 numbers along with a pair of factors.

n <- runif(30, min=0, max=10)
a <- gl(2, 1, 30)
b <- gl(6, 2, 30)

And I want the mean for each combination of factors.

y <- tapply(n, a:b, mean)

Now I want to use a lattice xyplot to plot these means where I have a panel for each of the two values of a. The means are the y values and the b factors are the x values. The stock xyplot formula would be something like

xyplot( y ~ b | a, data=mydf)

where mydf is a data frame with columns y, b, and ,a that were computed from tapply above. But my problem is how to disentangle the interacting factors. This is what I did.

factorSplit <- strsplit(names(y), ":")
a1 <- sapply(factorSplit, function(x) {x[1]})
b1 <- sapply(factorSplit, function(x) {x[2]})
mydf <- data.frame(y, b1, a1)

Now mydf has

> mydf
           y b1 a1
1:1 3.856797  1  1
1:2 3.487181  2  1
1:3 8.411425  3  1
1:4 3.757709  4  1
1:5 4.982970  5  1
1:6 6.480346  6  1
2:1 2.778864  1  2
2:2 4.390511  2  2
2:3 7.119926  3  2
2:4 4.707945  4  2
2:5 5.546894  5  2
2:6 8.984631  6  2

and I can plot with

xyplot(y ~ b1 | a1, mydf, layout=c(1,2))

But I feel this business with strsplit of names(y) and then sapply is overkill. It seems there should be a more direct method to recover a factor interaction created with tapply.

pglezen
  • 689
  • 4
  • 15
  • 4
    Since you want to preserve your `a` and `b` columns, it's better to use `aggregate` here than `tapply`: `y – MrFlick Mar 20 '16 at 22:18
  • `aggregate` is pretty nice. When base grouping gets too confusing, though, I usually turn to `dplyr`, which makes it pretty straightforward. In this case, `data_frame(n, a, b) %>% group_by(a, b) %>% summarise(y = mean(n))` – alistaire Mar 21 '16 at 00:21
  • Rather than updating the question with a solution, you should post it as an answer below (it's fine to answer your own questions) because that way the question no longer appears unanswered. – MrFlick Mar 21 '16 at 02:47

1 Answers1

0

The aggregate function is just what my understanding was lacking. As pointed out in the comments, one call to aggregate does everything I was slogging through earlier.

> x <- aggregate(n ~ a+b, NULL, mean)
> head(x)
  a b        n
1 1 1 2.967073
2 2 1 3.001279
3 1 2 3.867564
4 2 2 1.076378
5 1 3 2.805827
6 2 3 6.275858
> dim(x)
[1] 12  3
>
pglezen
  • 689
  • 4
  • 15