Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

24676 questions

votes

3 answers

A more elegant way to compute within-group proportions in dplyr?

Given a data_frame df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N')), I need to come up with a data_frame that tells us that 50% of A's are M, 50% of A's are N, 67% of B's are M, and 33% of B's are N. I have a little…

r dplyr tidyverse

asked Jan 24 '17 at 06:38

crf

1,371
3
10
21

votes

2 answers

Conditional mutate cumsum dlpyr

I have towns (from A to D), which have different populations, and are at different distances. The objective is to add up the total population living within the circle of radius (distance XY) where X is a town in the centre of the circle and Y any…

r conditional dplyr cumsum

asked Jan 18 '17 at 13:01

JPV

votes

3 answers

Calculating age using mutate with lubridate functions

I would like to calculate age based on birth date. If I use lubridate, I would just run the following as in Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date as.period(new_interval(start…

r dplyr lubridate

asked Jan 18 '17 at 08:16

HNSKD

1,378
1
11
22

votes

1 answer

Using mutate with dates gives numerical values

I am using the lubridate and dplyr packages to work with date variables and to create a new date variable, respectively. library(lubridate) library(dplyr) Let df be my dataframe. I have two variables date1 and date2. I want to create a new variable…

r date dplyr lubridate

asked Jan 16 '17 at 02:55

HNSKD

1,378
1
11
22

votes

1 answer

Sampling different numbers of rows by group in dplyr tidyverse

I'd like to sample rows from a data frame by group. But here's the catch, I'd like to sample a different number of records based on data from another table. Here is my reproducible data: df <- data_frame( Stratum = rep(c("High","Medium","Low"),…

r random dplyr tidyr purrr

asked Jan 15 '17 at 21:50

Zafar

1,699
12
27

votes

5 answers

R: calculate the number of occurrences of a specific event in a specified time future

my simplified data looks like this: set.seed(1453); x = sample(0:1, 10, TRUE) date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', …

r date dplyr aggregate

asked Jan 11 '17 at 14:27

Kasia Kulma

1,442
10
34

votes

4 answers

Counting new values not occuring earlier and not occuring in last group

I am trying to count number of unique "new" users per month. New is a user that has not appeared before (since the beginning) I am also trying to count number of unique users not appearing last month. The original data looks like library(dplyr) …

r count dplyr window

asked Jan 09 '17 at 20:42

user3482393

votes

2 answers

Strip trailing spaces from factor labels using dplyr chain

I have a dataframe loaded that has trailing white spaces in the factor labels. I am trying to remove those trailing spaces in every factor in the dataframe but am unsuccessful so far. Reproducable example lvls <- c('a ', 'b ', …

r dplyr

asked Jan 04 '17 at 15:35

Wietze314

5,675
1
17
35

votes

2 answers

Fit a different model for each row of a list-columns data frame

What is the best way to fit different model formulae that vary by the row of a data frame with the list-columns data structure in tidyverse? In R for Data Science, Hadley presents a terrific example of how to use the list-columns data structure and…

r dplyr tidyr tidyverse

asked Dec 30 '16 at 23:59

LmW.

1,246
9
15

votes

1 answer

Running out of heap space in sparklyr, but have plenty of memory

I am getting heap space errors on even fairly small datasets. I can be sure that I'm not running out of system memory. For example, consider a dataset containing about 20M rows and 9 columns, and that takes up 1GB on disk. I am playing with it on a…

r apache-spark dplyr sparklyr

asked Dec 29 '16 at 17:18

David Bruce Borenstein

1,323
1
13
31

votes

1 answer

SE filter_ by function taking multiple columns

I would like to filter a data frame to leave only the complete cases based on selected columns. This is easy to do with NSE filter(): library(dplyr) dd <- data.frame( id = 1:4, var1 = c(1, 2, NA, 4), var2 = c(1, NA, 3, 4), var3 = c(1, NA,…

r dplyr

asked Dec 21 '16 at 16:55

mdlincoln

votes

1 answer

Using `map()` in nested data frame

I am having some problems using the map() function along with the nest() function. I have some data set up like the following: counter counter date_time total 1 06032013 2013-06-03 16:00:00 476 2 06032013 2013-06-03 17:00:00 …

r dplyr xts tidyr purrr

asked Dec 20 '16 at 03:43

thus__

votes

1 answer

dplyr Exclude row

I am looking for an dplyr equivalent on SELECT user_id, item FROM users WHERE user_id NOT IN (1, 5, 6, 7, 11, 17, 18); -- admin accounts I can use users %>% filter(user_id != 1) but can't imagine using multiple && all the way. Is there a…

r dplyr

asked Dec 20 '16 at 02:44

Young Ha Kim

votes

2 answers

Substitute for mutate (dplyr package) in python pandas

Is there a Python pandas function similar to R's dplyr::mutate(), which can add a new column to grouped data by applying a function on one of the columns of the grouped data? Below is the detailed explanation of the problem: I generated sample data…

python r pandas dplyr mutate

asked Dec 14 '16 at 16:46

saurav shekhar

votes

1 answer

Joining list of data.frames from map() call

Is there a "tidyverse" way to join a list of data.frames (a la full_join(), but for >2 data.frames)? I have a list of data.frames as a result of a call to map(). I've used Reduce() to do something like this before, but would like to merge them as…

r dplyr purrr tidyverse

asked Dec 14 '16 at 16:21

Jennifer Thompson

Prev 1 2 3

…

100