Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

24676 questions

130

votes

6 answers

How to select the rows with maximum values in each group with dplyr?

I would like to select a row with maximum value in each group with dplyr. Firstly I generate some random data to show my question set.seed(1) df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5)) df$value <- runif(nrow(df)) In plyr, I could use a…

r dplyr plyr greatest-n-per-group

asked Jun 16 '14 at 06:00

Bangyou

8,000
12
53
88

130

votes

7 answers

Applying a function to every row of a table using dplyr?

When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row. e.g. data(iris) library(plyr) head( adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length)) ) …

r plyr dplyr

asked Feb 16 '14 at 23:21

Stephen Henderson

5,800
3
23
33

122

votes

7 answers

Replacement for "rename" in dplyr

I like plyr's renaming function rename. I have recently started using dplyr, and was wondering if there is an easy way to rename variables using a function from dplyr, that is as easy to use as to plyr's rename?

r rename dplyr

asked Feb 01 '14 at 19:25

vergilcw

1,963
4
13
20

118

votes

1 answer

Change value of variable with dplyr

I regularly need to change the values of a variable based on the values on a different variable, like this: mtcars$mpg[mtcars$cyl == 4] <- NA I tried doing this with dplyr but failed miserably: mtcars %>% mutate(mpg = mpg == NA[cyl == 4])…

r dataframe plyr dplyr

asked Jan 18 '15 at 19:28

luciano

11,848
27
77
118

115

votes

6 answers

Sum across multiple columns with dplyr

My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the…

r dplyr

asked Mar 05 '15 at 08:19

amo

2,412
3
20
33

112

votes

5 answers

Gather multiple sets of columns

I have data from an online survey where respondents go through a loop of questions 1-3 times. The survey software (Qualtrics) records this data in multiple columns—that is, Q3.2 in the survey will have columns Q3.2.1., Q3.2.2., and Q3.2.3.: df <-…

r reshape dplyr qualtrics tidyr

asked Sep 19 '14 at 02:41

Andrew

30,151
11
57
89

107

votes

7 answers

filter for complete cases in data.frame using dplyr (case-wise deletion)

Is it possible to filter a data.frame for complete cases using dplyr? complete.cases with a list of all variables works, of course. But that is a) verbose when there are a lot of variables and b) impossible when the variable names are not known…

r dplyr magrittr

asked Mar 12 '14 at 13:50

user2503795

3,765
2
28
48

104

votes

4 answers

dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories…

r dplyr plyr tidyr

asked Mar 20 '14 at 03:52

eipi10

81,881
20
176
248

101

votes

9 answers

R dplyr: Drop multiple columns

I have a dataframe and list of columns in that dataframe that I'd like to drop. Let's use the iris dataset as an example. I'd like to drop Sepal.Length and Sepal.Width and use only the remaining columns. How do I do this using select or select_ from…

r dplyr

asked Mar 07 '16 at 08:42

Navaneethan Santhanam

1,387
2
10
17

100

votes

6 answers

Getting the top values by group

Here's a sample data frame: d <- data.frame( x = runif(90), grp = gl(3, 30) ) I want the subset of d containing the rows with the top 5 values of x for each value of grp. Using base-R, my approach would be something like: ordered <-…

r data.table dplyr

asked Jan 04 '15 at 13:36

Richie Cotton

107,354
40
225
343

votes

5 answers

R Conditional evaluation when using the pipe operator %>%

When using the pipe operator %>% with packages such as dplyr, ggvis, dycharts, etc, how do I do a step conditionally? For example; step_1 %>% step_2 %>% if(condition) step_3 These approaches don't seem to work: step_1 %>% step_2 if(condition) %>%…

r dplyr ggvis magrittr

asked Jun 02 '15 at 18:44

rmf

8,389
10
46
89

votes

1 answer

R spreading multiple columns with tidyr

Take this sample variable df <- data.frame(month=rep(1:3,2), student=rep(c("Amy", "Bob"), each=3), A=c(9, 7, 6, 8, 6, 9), B=c(6, 7, 8, 5, 6, 7)) I can use spread from tidyr to change this to wide…

r dataframe dplyr tidyr

asked Jun 02 '15 at 09:22

Ricky

4,311
3
34
70

votes

6 answers

dplyr: "Error in n(): function should not be called directly"

I am attempting to reproduce one of the examples in the dplyr package but am getting this error message. I am expecting to see a new column n produced with the frequency of each combination. What am I missing? I triple checked that the package is…

r function plyr dplyr conflicting-libraries

asked Apr 02 '14 at 03:44

Michael Bellhouse

1,447
2
14
25

votes

3 answers

dplyr mutate with conditional values

In a large dataframe ("myfile") with four columns I have to add a fifth column with values conditionally based on the first four columns. Prefer answers with dplyr and mutate, mainly because of its speed in large datasets. My dataframe looks like…

r dplyr mutate

asked Mar 11 '14 at 21:48

rdatasculptor

7,072
7
49
70

votes

4 answers

dplyr on data.table, am I really using data.table?

If I use dplyr syntax on top of a datatable, do I get all the speed benefits of datatable while still using the syntax of dplyr? In other words, do I mis-use the datatable if I query it with dplyr syntax? Or do I need to use pure datatable syntax to…

r data.table dplyr

asked Dec 16 '14 at 18:35

Polymerase

5,067
6
35
54

Prev 1

…

99 100 Next