Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

24676 questions
815
votes
4 answers

data.table vs dplyr: can one do something well the other can't or does poorly?

Overview I'm relatively familiar with data.table, not so much with dplyr. I've read through some dplyr vignettes and examples that have popped up on SO, and so far my conclusions are that: data.table and dplyr are comparable in speed, except when…
BrodieG
  • 48,306
  • 7
  • 80
  • 131
219
votes
4 answers

Filter rows which contain a certain string

I have to filter a data frame using as criterion those row in which is contained the string RTB. I'm using dplyr. d.del <- df %>% group_by(TrackingPixel) %>% summarise(MonthDelivery = as.integer(sum(Revenue))) %>% …
Gianluca
  • 5,177
  • 16
  • 39
  • 63
216
votes
7 answers

Display / print all rows of a tibble (tbl_df)

tibble (previously tbl_df) is a version of a data frame created by the dplyr data frame manipulation package in R. It prevents long table outputs when accidentally calling the data frame. Once a data frame has been wrapped by tibble/tbl_df, is there…
Zhe Zhang
  • 2,383
  • 2
  • 12
  • 10
205
votes
9 answers

Use dynamic variable names in `dplyr`

I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated. Example data from iris: library(dplyr) iris <- as_tibble(iris) I've created a function to mutate my…
Timm S.
  • 3,997
  • 4
  • 20
  • 35
203
votes
7 answers

Extract a dplyr tbl column as a vector

Is there a more succinct way to get one column of a dplyr tbl as a vector, from a tbl with database back-end (i.e. the data frame/table can't be subset directly)? require(dplyr) db <- src_sqlite(tempfile(), create = TRUE) iris2 <- copy_to(db,…
nacnudus
  • 5,420
  • 5
  • 30
  • 46
198
votes
5 answers

Can dplyr package be used for conditional mutating?

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)? This example helps showing what I mean. structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6,…
rdatasculptor
  • 7,072
  • 7
  • 49
  • 70
192
votes
9 answers

Fixing a multiple warning "unknown column"

I have a persistent multiple warning of "unknown column" for all types of commands (e.g., str(x) to installing updates on packages), and not sure how to debug this or fix it. The warning "unknown column" is clearly related to a variable in a tbl_df…
ssp3nc3r
  • 3,247
  • 2
  • 9
  • 22
179
votes
10 answers

Relative frequencies / proportions with dplyr

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative frequency of number of gears by am (automatic/manual) in one go with…
jenswirf
  • 6,057
  • 8
  • 41
  • 62
165
votes
10 answers

Group by multiple columns in dplyr, using string vector input

I'm trying to transfer my understanding of plyr into dplyr, but I can't figure out how to group by multiple columns. # make data with weird column names that can't be hard coded data = data.frame( asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3],…
sharoz
  • 5,528
  • 7
  • 29
  • 52
162
votes
5 answers

Summarizing multiple columns with dplyr?

I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R. df <- data.frame( a = sample(1:5, n,…
Daniel
  • 6,454
  • 5
  • 21
  • 35
156
votes
9 answers

Select first and last row from grouped data

Question Using dplyr, how do I select the top and bottom observations/rows of grouped data in one statement? Data & Example Given a data frame df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), stopId=c("a","b","c","a","b","c","a","b","c"),…
tospig
  • 6,510
  • 11
  • 33
  • 75
152
votes
2 answers

How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003. Here is an example to recreate the output: library(tidyverse) library(hablar) df <- read_csv("year, week,…
Susie Derkins
  • 1,724
  • 2
  • 8
  • 17
149
votes
6 answers

Remove duplicated rows using dplyr

I have a data.frame like this - set.seed(123) df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10) > df x y z 1 0 1 1 2 1 0 2 3 0 1 3 4 1 1 4 5 1 0 5 6 0 1 6 7 1 0 7 8 1 0 8 9 1 0 9 10 0 1 10 I would…
Nishanth
  • 6,312
  • 5
  • 23
  • 36
147
votes
5 answers

What does %>% function mean in R?

I have seen the use of %>% (percent greater than percent) function in some packages like dplyr and rvest. What does it mean? Is it a way to write closure blocks in R?
alfakini
  • 3,965
  • 2
  • 23
  • 34
130
votes
1 answer

Can dplyr join on multiple columns or composite key?

I realize that dplyr v3.0 allows you to join on different variables: left_join(x, y, by = c("a" = "b") will match x.a to y.b However, is it possible to join on a combination of variables or do I have to add a composite key beforehand? Something like…
JasonAizkalns
  • 18,131
  • 6
  • 47
  • 99
1
2 3
99 100