205

I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated.

Example data from iris:

library(dplyr)
iris <- as_tibble(iris)

I've created a function to mutate my new columns from the Petal.Width variable:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df <- mutate(df, varname = Petal.Width * n)  ## problem arises here
    df
}

Now I create a loop to build my columns:

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).

How can I get mutate() to use my dynamic name as variable name?

MrFlick
  • 163,738
  • 12
  • 226
  • 242
Timm S.
  • 3,997
  • 4
  • 20
  • 35
  • i believe there's [a space to watch](https://github.com/hadley/dplyr/issues/352#issuecomment-53829862) in the [lazyeval package](https://github.com/hadley/lazyeval) – baptiste Sep 23 '14 at 20:59
  • 18
    The vignette doesn't even mention `mutate_`, and it really isn't obvious from the other functions how to use it. – nacnudus Jul 25 '16 at 15:16
  • 1
    I have struggled to understand the `quosure` et al. documentation for years. While the vignette link above no longer works, that comment lead me to this summary to tidyevaluation: https://shipt.tech/https-shipt-tech-advanced-programming-and-non-standard-evaluation-with-dplyr-e043f89deb3d. I finally understand! Thank you. – Josh Nov 23 '20 at 21:05
  • `dplyr` provides [this excellent vignette on Programming with dplyr](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) that covers this. – Gregor Thomas Jan 22 '21 at 19:55

9 Answers9

243

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df[[varname]] <- with(df, Petal.Width * n)
    df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.


dplyr version >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.

multipetal <- function(df, n) {
  mutate(df, "petal.{n}" := Petal.Width * n)
}

dplyr version >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").


dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    varval <- lazyeval::interp(~Petal.Width * n, n=n)
    mutate_(df, .dots= setNames(list(varval), varname))
}

dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
    do.call("mutate", pp)
}
MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • 25
    Thank you, that's helpful. btw, I always create really dramatic variables. – Timm S. Sep 23 '14 at 20:18
  • 28
    Hehe. that's probably one of my favorite typos i've made in a while. I think i'll leave it. – MrFlick Sep 23 '14 at 20:19
  • 1
    `do.call()` probably doesn't do what you think it does: http://rpubs.com/hadley/do-call2. See also the nse vignette in the dev version of dplyr. – hadley Sep 26 '14 at 11:39
  • 4
    So if I understand your point @hadley, I've updated the `do.call` above to use `do.call("mutate")` and to quote `df` in the list. Is that what you were suggesting? And when the `lazyeval` version of `dplyr` is the released version, then `mutate_(df, .dots= setNames(list(~Petal.Width * n), varname))` would be a better solution? – MrFlick Sep 26 '14 at 12:45
  • @MrFlick yes and yes :) – hadley Sep 29 '14 at 14:14
  • Is this still valid (i.e. "hard")? – Mike Wise Mar 16 '16 at 23:05
  • @MikeWise yes. This is still the recommended way. – MrFlick Mar 17 '16 at 01:12
  • 2
    What if I need the variable column header not only on the left hand side of the assignment but also on the right? e.g. `mutate(df, !!newVar := (!!var1 + !!var2) / 2)` does not work :( – Mario Reutter Jul 04 '17 at 15:09
  • Thumbs up for including this helpful reference `vignette("programming", "dplyr")` – Kate Jan 27 '20 at 22:35
  • Shouldn't the dplyr 1.0 version be: mutate(df, "petal.{{n}}" := Petal.Width * n) – flimbar Nov 23 '20 at 10:13
  • @flimbar If you try it, you'll see it works with a single brace. You'd only need double braces if `n` were passed in a as a symbol perhaps. – MrFlick Nov 23 '20 at 16:32
  • Ah yes, in my case I am passing unquoted column names to mutate. – flimbar Nov 24 '20 at 18:04
65

In the new release of dplyr (0.6.0 awaiting in April 2017), we can also do an assignment (:=) and pass variables as column names by unquoting (!!) to not evaluate it

 library(dplyr)
 multipetalN <- function(df, n){
      varname <- paste0("petal.", n)
      df %>%
         mutate(!!varname := Petal.Width * n)
 }

 data(iris)
 iris1 <- tbl_df(iris)
 iris2 <- tbl_df(iris)
 for(i in 2:5) {
     iris2 <- multipetalN(df=iris2, n=i)
 }   

Checking the output based on @MrFlick's multipetal applied on 'iris1'

identical(iris1, iris2)
#[1] TRUE
akrun
  • 674,427
  • 24
  • 381
  • 486
35

After a lot of trial and error, I found the pattern UQ(rlang::sym("some string here"))) really useful for working with strings and dplyr verbs. It seems to work in a lot of surprising situations.

Here's an example with mutate. We want to create a function that adds together two columns, where you pass the function both column names as strings. We can use this pattern, together with the assignment operator :=, to do this.

## Take column `name1`, add it to column `name2`, and call the result `new_name`
mutate_values <- function(new_name, name1, name2){
  mtcars %>% 
    mutate(UQ(rlang::sym(new_name)) :=  UQ(rlang::sym(name1)) +  UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')

The pattern works with other dplyr functions as well. Here's filter:

## filter a column by a value 
filter_values <- function(name, value){
  mtcars %>% 
    filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)

Or arrange:

## transform a variable and then sort by it 
arrange_values <- function(name, transform){
  mtcars %>% 
    arrange(UQ(rlang::sym(name)) %>%  UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')

For select, you don't need to use the pattern. Instead you can use !!:

## select a column 
select_name <- function(name){
  mtcars %>% 
    select(!!name)
}
select_name('mpg')
Tom Roth
  • 1,344
  • 12
  • 20
  • Your tips works very well, but I have a little issue. I change an initial column `myCol` to an url (for example), and copy the old column `myColInitialValue` at the end of the dataframe `df` with a new name. But a `which(colnames(df)=='myCol')` send back the col # of `myColInitialValue`. I didn't write an issue yet because I didn't found a reprex. My goal is for the `escape` parameter of `DT::datatable()`. I use `escape=FALSE` in waiting that. With constants it doesn't work also but the [DT package](https://github.com/rstudio/DT/blob/master/R/datatables.R) seems also get the bad # column. :) – phili_b Jul 20 '19 at 09:37
  • My question [dplyr - mutate: use dynamic variable names, works well, but seems not work with # column for `escape` an `DT::datatable`](https://stackoverflow.com/q/57123813/10489562) – phili_b Jul 20 '19 at 10:50
  • It seems than dynamic variables are not the cause. (btw reprex added) – phili_b Jul 22 '19 at 09:17
  • Thanks for this answer! Here is a super-simple example of how I used it: `varname = sym("Petal.Width"); ggplot(iris, aes(x=!!varname)) + geom_histogram()` – bdemarest Nov 04 '19 at 02:29
  • This worked for me inside a formula where !!varname wasn't working. – daknowles May 15 '20 at 18:44
  • `sym()` is included by default in dplyr. see `help(sym)`. – Paul Rougieux Aug 14 '20 at 15:58
14

Here's another version, and it's arguably a bit simpler.

multipetal <- function(df, n) {
    varname <- paste("petal", n, sep=".")
    df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
    df
}

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1          5.1         3.5          1.4         0.2  setosa     0.4     0.6     0.8       1
2          4.9         3.0          1.4         0.2  setosa     0.4     0.6     0.8       1
3          4.7         3.2          1.3         0.2  setosa     0.4     0.6     0.8       1
4          4.6         3.1          1.5         0.2  setosa     0.4     0.6     0.8       1
5          5.0         3.6          1.4         0.2  setosa     0.4     0.6     0.8       1
6          5.4         3.9          1.7         0.4  setosa     0.8     1.2     1.6       2
user2946432
  • 141
  • 1
  • 4
13

With rlang 0.4.0 we have curly-curly operators ({{}}) which makes this very easy. When a dynamic column name shows up on the left-hand side of an assignment, use :=.

library(dplyr)
library(rlang)

iris1 <- tbl_df(iris)

multipetal <- function(df, n) {
   varname <- paste("petal", n , sep=".")
   mutate(df, {{varname}} := Petal.Width * n)
}

multipetal(iris1, 4)

# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.4
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
# 1          5.1         3.5          1.4         0.2 setosa      0.8
# 2          4.9         3            1.4         0.2 setosa      0.8
# 3          4.7         3.2          1.3         0.2 setosa      0.8
# 4          4.6         3.1          1.5         0.2 setosa      0.8
# 5          5           3.6          1.4         0.2 setosa      0.8
# 6          5.4         3.9          1.7         0.4 setosa      1.6
# 7          4.6         3.4          1.4         0.3 setosa      1.2
# 8          5           3.4          1.5         0.2 setosa      0.8
# 9          4.4         2.9          1.4         0.2 setosa      0.8
#10          4.9         3.1          1.5         0.1 setosa      0.4
# … with 140 more rows

We can also pass quoted/unquoted variable names to be assigned as column names.

multipetal <- function(df, name, n) {
   mutate(df, {{name}} := Petal.Width * n)
}

multipetal(iris1, temp, 3)

# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  temp
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
# 1          5.1         3.5          1.4         0.2 setosa  0.6  
# 2          4.9         3            1.4         0.2 setosa  0.6  
# 3          4.7         3.2          1.3         0.2 setosa  0.6  
# 4          4.6         3.1          1.5         0.2 setosa  0.6  
# 5          5           3.6          1.4         0.2 setosa  0.6  
# 6          5.4         3.9          1.7         0.4 setosa  1.2  
# 7          4.6         3.4          1.4         0.3 setosa  0.900
# 8          5           3.4          1.5         0.2 setosa  0.6  
# 9          4.4         2.9          1.4         0.2 setosa  0.6  
#10          4.9         3.1          1.5         0.1 setosa  0.3  
# … with 140 more rows

It works the same with

multipetal(iris1, "temp", 3)
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
4

I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.

I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date() take an argument that is a string and convert it to a column, so I did it as shown below.

Below is how I did this via SE mutate (mutate_()) and the .dots argument. Criticisms that make this better are welcome.

library(dplyr)

dat <- data.frame(a="leave alone",
                  dt="2015-08-03 00:00:00",
                  dt2="2015-01-20 00:00:00")

# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
    for (col in dtnames) {
        varval <- sprintf("as.Date(%s)", col)
        df <- df %>% mutate_(.dots= setNames(list(varval), col))
    }
    return(df)
}

dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
mpettis
  • 2,468
  • 4
  • 19
  • 29
3

While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.

Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.

multipetal <- function(df, n) {
   for (i in 1:n){
      varname <- paste("petal", i , sep=".")
      df[[varname]] <- with(df, Petal.Width * i)
   }
   df
}
multipetal(iris, 3) 
hackR
  • 1,276
  • 13
  • 26
  • 2
    +1, although I still use `dplyr` a lot in non-interactive settings, using it with variabel input inside a function uses very clunky syntax. – Paul Hiemstra Feb 14 '17 at 12:30
3

You may enjoy package friendlyeval which presents a simplified tidy eval API and documentation for newer/casual dplyr users.

You are creating strings that you wish mutate to treat as column names. So using friendlyeval you could write:

multipetal <- function(df, n) {
  varname <- paste("petal", n , sep=".")
  df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}

Which under the hood calls rlang functions that check varname is legal as column name.

friendlyeval code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.

MilesMcBain
  • 743
  • 6
  • 10
1

Another alternative: use {} inside quotation marks to easily create dynamic names. This is similar to other solutions but not exactly the same, and I find it easier.

library(dplyr)
library(tibble)

iris <- as_tibble(iris)

multipetal <- function(df, n) {
  df <- mutate(df, "petal.{n}" := Petal.Width * n)  ## problem arises here
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}
iris

I think this comes from dplyr 1.0.0 but not sure (I also have rlang 4.7.0 if it matters).

bretauv
  • 3,557
  • 2
  • 7
  • 32