3

I have a dataframe:

df <- data.frame(Category = c(rep("A", 3), rep("B", 3)), Value = rnorm(6))
df
 Category       Value
1        A -0.94968814
2        A  2.56687061
3        A -0.15665153
4        B -0.47647105
5        B  0.83015076
6        B -0.03744522

Now I want to add another column which is the mean per Category. This can be done with the dplyr package really easy:

df %>% group_by(Category) %>% 
  summarize(mean = mean(Value))

Now in piece of code my problem is: I can't use mean(Value), but I have a variable name that knows the column name: columnName = "Value" But this unfortunately won't work:

columnName = "Value"

df %>% group_by(Category) %>% 
  summarize(mean = mean(columnName))

Warning messages: 1: In mean.default("Value") : argument is not numeric or logical: returning NA 2: In mean.default("Value") :
argument is not numeric or logical: returning NA

How can I pass the column name with the variable?

David Arenburg
  • 87,271
  • 15
  • 123
  • 181
user2874583
  • 511
  • 1
  • 4
  • 13
  • `mean(df[,columnName])` this code worked for me, when using the same variables as you did. – Benjamin Mohn Dec 21 '16 at 10:06
  • 1
    No, that doesn't work. It has to be mean of the groups, not the mean of the column. – user2874583 Dec 21 '16 at 10:08
  • It is not using the package `dplyr` but it works like this: `tapply(df[,columnName],df$Category, mean)` – Benjamin Mohn Dec 21 '16 at 10:14
  • please use `set.seed` when using such functions as `rnorm` to create data frames so we can double check results – Sotos Dec 21 '16 at 10:16
  • 2
    This is called *standard evaluation*. There hundreds are of dupes regarding this on SO. Please read `vignette("nse")`. One way to achieve this is `library(lazyeval) ; dots % group_by(Category) %>% summarise_(.dots = dots)` – David Arenburg Dec 21 '16 at 10:17
  • See also this http://stackoverflow.com/questions/26724124/standard-evaluation-in-dplyr-summarise-on-variable-given-as-a-character-string – David Arenburg Dec 21 '16 at 10:23

1 Answers1

2

We can use get with aggregate

aggregate(get(columnName)~Category, df, mean)

#    Category get(columnName)
#1        A      -0.5490751
#2        B      -0.2594670
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • 1
    This works thanks! But I was looking for a solution within the dplyr package. Do you know if that is possible too? – user2874583 Dec 21 '16 at 10:18