Questions tagged [mutate]

mutate is a verb to create a new column in a data frame using dplyr package in R programming language.

When using dplyr for data wrangling, mutate is commonly used to add a new column to a data frame. With a data frame df, mutate(df, a = 3) adds a constant column 3 to the data frame. For more information on related verbs in dplyr and more examples on how to use mutate check the official help page.

Note that as of dplyr version 1.0.0, the scoped helpers (including mutate_if, mutate_at and mutate_all) have been superseded by across() (see the official change-log).

1668 questions
198
votes
5 answers

Can dplyr package be used for conditional mutating?

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)? This example helps showing what I mean. structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6,…
rdatasculptor
  • 7,072
  • 7
  • 49
  • 70
95
votes
3 answers

dplyr mutate with conditional values

In a large dataframe ("myfile") with four columns I have to add a fifth column with values conditionally based on the first four columns. Prefer answers with dplyr and mutate, mainly because of its speed in large datasets. My dataframe looks like…
rdatasculptor
  • 7,072
  • 7
  • 49
  • 70
34
votes
4 answers

Calculate group mean (or other summary stats) and assign to original data

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group"). The summary statistic should be assigned to a new variable which…
Mike
  • 591
  • 2
  • 6
  • 5
15
votes
8 answers

How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame. The data looks as follows: df <- data.frame(a1 = c(1:5), b1 =…
user30276
  • 153
  • 1
  • 1
  • 4
13
votes
2 answers

How to use custom functions in mutate (dplyr)?

I'm rewriting all my code using dplyr, and need help with mutate / mutate_at function. All I need is to apply custom function to two columns in my table. Ideally, I would reference these columns by their indices, but now I can't make it work even…
kintany
  • 481
  • 1
  • 5
  • 14
12
votes
2 answers

Mutate with a list column function in dplyr

I am trying to calculate the Jaccard similarity between a source vector and comparison vectors in a tibble. First, create a tibble with a names_ field (vector of strings). Using dplyr's mutate, create names_vec, a list-column, where each row is…
matsuo_basho
  • 2,059
  • 7
  • 20
  • 39
11
votes
1 answer

dplyr / tidyevaluation: How to pass an expression in mutate as a string?

I want to write a function that has two inputs: The name of a new variable and a mathematical expression. Both arguments come as strings. This function should take a data.frame and add the specified new variable which should be the result of the…
der_grund
  • 1,499
  • 13
  • 29
10
votes
1 answer

Using case_when within mutate_at

I would like to use case_when within mutate_at, as in the following example: mtcars %>% mutate_at(.vars = vars(vs, am), .funs = funs(case_when( . %in% c(1,0,9) ~ TRUE . %in% c(2,20,200) ~ FALSE …
Konrad
  • 14,406
  • 15
  • 86
  • 141
10
votes
2 answers

R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain. Say I have a data frame like this (where blob is some variable not related to the specific task but is part of the entire data)…
user3375672
  • 3,268
  • 7
  • 34
  • 62
9
votes
3 answers

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping. Label <-…
mckisa
  • 125
  • 1
  • 2
  • 5
9
votes
1 answer

Warning: Mutate_impl(.data, dots): Unequal factor levels: coercing to character

The Titanic Dataset can be downloaded from kaggle: kaggle.com/c/titanic/data. Please use the train.csv or install the package 'titanic' and use the dataset titanic_train. This works library(dplyr) library(stringr) titanic <- titanic %>% …
cappuccino
  • 295
  • 3
  • 12
8
votes
2 answers

Divide all columns by a chosen column using mutate_all

I have a sample data frame that looks like this (my full dataframe has "d" plus 57 elements): d <- seq(0, 100, 0.5) Fe <- runif(201, min = 0, max = 1000) Ca <- runif(201, min = 0, max = 1000) Zr <- runif(201, min = 0, max = 1000) Ti <-…
JJGabe
  • 171
  • 1
  • 8
8
votes
1 answer

Does dplyr::mutate() not recycle vectors?

It seems as if creating a column with dplyr::mutate() does not allow vector recycling. Why? Example: require(dplyr) df <- data_frame(id = rep(1:5, each = 42), name = rep(letters[1:7], each = 6, times = 5)) now: df %>% mutate (tp = c(1:42)) …
tjebo
  • 12,885
  • 4
  • 34
  • 61
7
votes
3 answers

calculate indices with base year and relative percentage change

I am looking for a way to, within id and groups, create an index on 100 using the lag (or is it lead) of value and the new index number idx_value to calculate the next index number. # install.packages(c("tidyverse"), dependencies =…
Eric Fail
  • 7,222
  • 5
  • 61
  • 118
7
votes
3 answers

Using `mutate_at` and `na_if` together to replace zeros with NA for only some columns

My data takes this format: library(tidyverse) df <- mtcars df <- df %>% mutate(vs_doubled = vs * 2) %>% select(mpg, cyl, vs, am, vs_doubled) head(df) #> mpg cyl vs am vs_doubled #> 1 21.0 6 0 1 0 #> 2 21.0 6 0 1 0 #>…
Jeremy K.
  • 1,427
  • 7
  • 21
1
2 3
99 100