0

Say I have a data frame, data That contains multiple sites, indicated by integer site codes. Within those sites are samples from multiple horizons, A,B and C, which have observations of some type, indicated in the column value:

site<- c(12,12,12,12,45,45,45,45)
horizon<-c('A','A','B','C','A','A','B','C')
value<- c(19,14,3,2,18,19,4,5)
comment<- c('pizza','pizza','pizza','pizza','taco','taco','taco','taco')
data<- data.frame(site,horizon,value,comment)

Which looks like this:

  site horizon value comment
1   12       A    19   pizza
2   12       A    14   pizza
3   12       B     3   pizza
4   12       C     2   pizza
5   45       A    18    taco
6   45       A    19    taco
7   45       B     4    taco
8   45       C     5    taco

In this case both sites have multiple A observations. I would like to average the values of of duplicate horizons within a site. I would like to retain the comment line within the data frame as well. All observations within a site have the same entry within the comment vector. I would like the output to look like this:

  site horizon value comment
1   12       A  16.5   pizza
3   12       B     3   pizza
4   12       C     2   pizza
5   45       A  18.5    taco
7   45       B     4    taco
8   45       C     5    taco
colin
  • 2,387
  • 3
  • 20
  • 47
  • 1
    With dplyr, this works: `data %>% group_by(site,horizon,comment) %>% summarise_each(funs(mean))`, though you should have 16.5 not 18.5 in the first row, eh? – Frank Nov 05 '15 at 20:34
  • @Frank thanks! Is there any way to have dplyr do this without specifying the comment vector. My real data set has many many comment vectors. – colin Nov 05 '15 at 20:40
  • 1
    Hm, you could do `data %>% group_by_(.dots=setdiff(names(.),"value")) %>% summarise_each(funs(mean))`. Personally, I would just keep the `comment` data in a separate table if it's determined by `site`. – Frank Nov 05 '15 at 20:42
  • @frank good point. I think I'm going to go that route actually. – colin Nov 05 '15 at 20:51

1 Answers1

0
d <- read.table(header=TRUE, text=
'  site horizon value comment
1   12       A    19   pizza
2   12       A    14   pizza
3   12       B     3   pizza
4   12       C     2   pizza
5   45       A    18    taco
6   45       A    19    taco
7   45       B     4    taco
8   45       C     5    taco')
merge(aggregate(value ~ site+horizon, FUN=mean, data=d), unique(d[,-3]))
jogo
  • 12,113
  • 11
  • 32
  • 39