Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

24676 questions
5
votes
2 answers

Complete column with group_by and complete

I've got a little problem using dplyr group_by function. After doing this : datasetALL %>% group_by(YEAR,Region) %>% summarise(count_number = n()) here is the result : YEAR Region count_number 1 1946 1 …
Ben
  • 53
  • 1
  • 4
5
votes
3 answers

dplyr for rowwise quantiles

I have a df of strata, each of which has 1000 samples from a posterior distribution of the estimates from that stratum. mydf <- as.data.frame(lapply(seq(1, 1000), rnorm, n=100)) colnames(mydf) <- paste('s', seq(1, ncol(mydf)), sep='') I want to…
wylbur
  • 87
  • 7
5
votes
1 answer

How to pass column names into a function dplyr

I'm trying to create a simple summary function to speed up the reporting of multiple columns of data for use in a R Markdown file. var1 is a categorical column of data, t_var is an integer representing the quarter of data, and dt is the full…
elksie5000
  • 4,829
  • 8
  • 44
  • 71
5
votes
2 answers

Combine select and mutate

Quite often, I find myself manually combining select() and mutate() functions within dplyr. This is usually because I'm tidying up a dataframe, want to create new columns based on the old columns, and only want keep the new columns. For example, if…
mdpead
  • 71
  • 1
  • 5
5
votes
2 answers

Group by aggregate dynamic column name matching

Is it possible to group_by using regex match on column names using dplyr? library(dplyr) # dplyr_0.5.0; R version 3.3.2 (2016-10-31) # dummy data set.seed(1) df1 <- sample_n(iris, 20) %>% mutate(Sepal.Length = round(Sepal.Length), …
zx8754
  • 42,109
  • 10
  • 93
  • 154
5
votes
1 answer

R dplyr method to replace all empty factors with NA

Instead of writing and reading a dataframe to fill all empty factors in this method, na.strings=c("","NA") I wanted to just apply a function to all the columns and substitute the empties with NA. I've selected the factor columns so far but don't…
Ricky
  • 255
  • 3
  • 11
5
votes
1 answer

Using dplyr to group_by and conditionally mutate a dataframe by group

I'd like to use dplyr functions to group_by and conditionally mutate a df. Given this sample data: A B C D 1 1 1 0.25 1 1 2 0 1 2 1 0.5 1 2 2 0 1 3 1 0.75 1 3 2 0.25 2 1 1 0 2 1 2 0.5 2 2 1 …
ucsbcoding
  • 97
  • 1
  • 9
5
votes
0 answers

dplyr summarise evaluates custom function twice?

I am using dplyr group_by and summarise functions with custom made aggregate function, and have observed a strange behavior. It seems like the aggregate function is evaluate twice for each group. Here is a minimal example: aggFun <- function(x) {…
Øystein S
  • 372
  • 1
  • 10
5
votes
2 answers

Faster coding than using for loop

Suppose I have the following data frame set.seed(36) n <- 300 dat <- data.frame(x = round(runif(n,0,200)), y = round(runif(n, 0, 500))) d <- dat[order(dat$y),] For each value of d$y<=300, I have to create a variable res in which the…
user 31466
  • 699
  • 2
  • 10
  • 18
5
votes
2 answers

How to use data.table within functions and loops?

While assessing the utility of data.table (vs. dplyr), a critical factor is the ability to use it within functions and loops. For this, I've modified the code snippet used in this post: data.table vs dplyr: can one do something well the other can't…
IVIM
  • 1,363
  • 1
  • 9
  • 26
5
votes
2 answers

Tracking which group fails in a dplyr chain

How can I find out which group failed when using group_by in a dplyr type chain. Take for example: library(dplyr) data(iris) iris %>% group_by(Species) %>% do(mod=lm(Petal.Length ~ Petal.Width, data = .)) %>% mutate(Slope =…
boshek
  • 2,958
  • 1
  • 24
  • 51
5
votes
4 answers

Remove the first N rows from each factor level in an r data.frame

With the dat below. How can I make a new dataframe subset that includes all values except the first five rows for each IndID? Said differently I want new data frame with the first 5 rows for each IndID excluded. set.seed(123) dat <-…
B. Davis
  • 3,021
  • 4
  • 29
  • 67
5
votes
1 answer

Filling "implied missing values" in a data frame that has varying observations per time unit

I have a large dataset with spatiotemporal data. Each set of coordinates are associated with an id (player id in a computer game). Unfortunately the coordinates for each id aren't logged at every time unit. If a reading is not available for a…
Lauler
  • 139
  • 1
  • 6
5
votes
1 answer

ggplot: How to make the x/time-axis of a time-series plot only the time-component, not the date?

Consider the following example library(lubridate) library(tidyverse) library(scales) library(ggplot2) dataframe <- data_frame(time = c(ymd_hms('2008-01-04 00:00:00'), ymd_hms('2008-01-04 00:01:00'), …
ℕʘʘḆḽḘ
  • 15,284
  • 28
  • 88
  • 180
5
votes
2 answers

top_n versus order in r

I am having trouble understanding the output from dplyr's top_n function. Can anybody help? n=10 df = data.frame(ref=sample(letters,n),score=rnorm(n)) require(dplyr) print(dplyr::top_n(df,5,score)) print(df[order(df$score,decreasing =…
PM.
  • 494
  • 1
  • 7
  • 13
1 2 3
99
100