2

I am trying to create a multiple bar chart of my data, depicting the mean of avgct for each region with error bars using ggplot2.

Here is a sample of my data:

gregion lregion   avgct
1          e      1.146
1          e      0.947
2          e      0.908    
3          e      1.167
1          t      1.225   
2          t      1.058
2          t      2.436
3          t      0.679

So far I have managed to create this graph, but it seems to be plotting the maximum values for avgct not the mean and therefore I cannot create error bars.

enter image description here

I think I need to calculate the mean of avgct by gregion and lregion so that I have an average value of avgct for each region, like this:

gregion lregion   mean(avgct)   
1          e      1.047 
2          e      0.908 
3          e      1.167
1          t      1.225 
2          t      1.747
3          t      0.679

If anyone can help me with this so that I can plot a barchart of averages with error bars for my data it would be very much appreciated!

rcs
  • 61,470
  • 21
  • 164
  • 147
opalfruits
  • 499
  • 2
  • 4
  • 10
  • This does kind of seem to be a duplicate of http://stackoverflow.com/questions/25198442/how-to-calculate-mean-median-per-group-in-a-dataframe-in-r which itself is already labeled a duplicate. – Mark Miller Apr 26 '15 at 14:28
  • @MarkMiller this is a duplicate of so many duplicates. Not to mention it shows up in many other sites. Even Cross Validated. There are also many versions with `sum` instead of `mean` and etc. – David Arenburg Apr 26 '15 at 14:30

1 Answers1

1

This is a basic aggregation question, so the typical starting point should be aggregate:

> aggregate(avgct ~ gregion + lregion, mydf, mean)
  gregion lregion  avgct
1       1       e 1.0465
2       2       e 0.9080
3       3       e 1.1670
4       1       t 1.2250
5       2       t 1.7470
6       3       t 0.6790

There are, however, several other alternatives, including "dplyr" and "data.table", that may be more appealing in the long run for convenience of syntax and overall efficiency.

library(data.table)
as.data.table(mydf)[, mean(avgct), by = .(gregion, lregion)]


library(dplyr)
mydf %>% group_by(gregion, lregion) %>% summarise(avgct = mean(avgct))
A5C1D2H2I1M1N2O1R2T1
  • 177,446
  • 27
  • 370
  • 450