2

I'm writing a function that I'm going to use on multiple columns in dplyr, but I'm having trouble passing column names as inputs to functions for dplyr.

Here's an example of what I want to do:

df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))


example<-function(colname){
  df %>%
    group_by(group)%>%
    summarize(output=mean(sqrt(colname)))%>%
    select(output)
}
example("var1")

Output should look like

df %>%
  group_by(group)%>%
  summarize(output=mean(sqrt(var1)))%>%
  select(output)

I've found a few similar questions, but nothing that I could directly apply to my problem, so any help is appreciated. I've tried some solutions involving eval, but I honestly don't know what exactly I'm supposed to be passing to eval.

C_Z_
  • 6,482
  • 5
  • 31
  • 64

2 Answers2

4

Is this what you expected?

df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))

example<-function(colname){
  df %>%
    group_by(group)%>%
    summarize(output=mean(sqrt(colname)))%>%
    select(output)
}
example( quote(var1) )
#-----
Source: local data frame [2 x 1]

    output
1 7.185935
2 8.090866
IRTFM
  • 240,863
  • 19
  • 328
  • 451
  • Yes, this is perfect. Thanks, I was worried the answer was going to be much more complicated – C_Z_ Mar 13 '15 at 21:55
  • I'm not a big dplyr user and so I'm wondering why the function `select` is needed. (Taking it out doesn't seem to affect behavior in the small (n=1) number of tests I've done.) – IRTFM Mar 13 '15 at 22:00
  • 1
    @BondedDust it's not needed. The use of `summarize()` collapse the result to `output`. You can safely remove it. – Steven Beaupré Mar 14 '15 at 02:19
2

The accepted answer does not work anymore in R 3.6 / dplyr 0.8.

As suggested in another answer, one can use !!as.name()

This works for me:

df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))

example<-function(colname){
  df %>%
    group_by(group)%>%
    summarize(output=mean(sqrt(!!as.name(colname)))%>%
    select(output)
}
example( quote(var1) )

If one additionally wants to have the column names to assign to in a mutate, then the easiest is to use the assignment :=. For example to replace colname with its square root.

example_mutate<-function(colname){
  df %>%
    mutate(!!colname := sqrt(!!as.name(colname)))
}
example_mutate( quote(var1) )

quote() can of course be replaced with quotation marks "".

Martin
  • 623
  • 5
  • 15