Use function in groupby with variable column name in R using dplyr

Question

I have a dataframe:

df <- data.frame(Category = c(rep("A", 3), rep("B", 3)), Value = rnorm(6))
df
 Category       Value
1        A -0.94968814
2        A  2.56687061
3        A -0.15665153
4        B -0.47647105
5        B  0.83015076
6        B -0.03744522

Now I want to add another column which is the mean per Category. This can be done with the dplyr package really easy:

df %>% group_by(Category) %>% 
  summarize(mean = mean(Value))

Now in piece of code my problem is: I can't use mean(Value), but I have a variable name that knows the column name: columnName = "Value" But this unfortunately won't work:

columnName = "Value"

df %>% group_by(Category) %>% 
  summarize(mean = mean(columnName))

Warning messages: 1: In mean.default("Value") : argument is not numeric or logical: returning NA 2: In mean.default("Value") :
argument is not numeric or logical: returning NA

How can I pass the column name with the variable?

`mean(df[,columnName])` this code worked for me, when using the same variables as you did. — Benjamin Mohn, Dec 21 '16 at 10:06
No, that doesn't work. It has to be mean of the groups, not the mean of the column. — user2874583, Dec 21 '16 at 10:08
It is not using the package `dplyr` but it works like this: `tapply(df[,columnName],df$Category, mean)` — Benjamin Mohn, Dec 21 '16 at 10:14
please use `set.seed` when using such functions as `rnorm` to create data frames so we can double check results — Sotos, Dec 21 '16 at 10:16
This is called *standard evaluation*. There hundreds are of dupes regarding this on SO. Please read `vignette("nse")`. One way to achieve this is `library(lazyeval) ; dots % group_by(Category) %>% summarise_(.dots = dots)` — David Arenburg, Dec 21 '16 at 10:17
See also this http://stackoverflow.com/questions/26724124/standard-evaluation-in-dplyr-summarise-on-variable-given-as-a-character-string — David Arenburg, Dec 21 '16 at 10:23

score 2 · Answer 1 · answered Dec 21 '16 at 10:10

2

We can use get with aggregate

aggregate(get(columnName)~Category, df, mean)

#    Category get(columnName)
#1        A      -0.5490751
#2        B      -0.2594670

answered Dec 21 '16 at 10:10

Ronak Shah

286,338
16
97
143

1

This works thanks! But I was looking for a solution within the dplyr package. Do you know if that is possible too? – user2874583 Dec 21 '16 at 10:18

Use function in groupby with variable column name in R using dplyr

1 Answers1