Questions tagged [data.table]

The R data.table package is an extension of data.frame built for fast in-memory data analysis. Use the dt tag for the DataTables package with Shiny (DT).

's data.table package provides an enhanced version of data.frame including fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast overlapping range joins, fast add/modify/delete of columns by reference by group using no copies at all, and a fast file reader: fread. It has a natural syntax: DT[where|order, select|update, by]. SQL-inspired syntax enables joins within [] by using on to specify matching columns. These queries can be chained together just by adding another one on the end: DT[...][...].

The aggregation features are analogous to stats::ave, plyr::ddply, dplyr::group_by and Python's pandas, but faster.

Repositories

Detailed HTML vignettes

Other vignettes to follow, see here and feel free to voice support for your most-wanted!

Other resources

Other operations to be benchmarked.

Related tags

11627 questions
83
votes
10 answers

How to replace NA values in a table for selected columns

There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following: x[is.na(x)]<-0 But, what if I want to restrict it to only certain columns? Let's me show you an…
jnam27
  • 1,187
  • 1
  • 11
  • 16
81
votes
2 answers

Remove multiple columns from data.table

What's the correct way to remove multiple columns from a data.table? I'm currently using the code below, but was getting unexpected behavior when I accidentally repeated one of the column names. I wasn't sure if this was a bug, or if I shouldn't…
matt_k
  • 3,429
  • 4
  • 24
  • 32
79
votes
7 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…
Ed Swindelles
  • 903
  • 1
  • 7
  • 6
79
votes
2 answers

Extract a column from a data.table as a vector, by position

How do I extract a column from a data.table as a vector by its position? Below are some code snippets I have tried: DT<-data.table(x=c(1,2),y=c(3,4),z=c(5,6)) DT # x y z #1: 1 3 5 #2: 2 4 6 I want to get this output using column position DT$y…
Wet Feet
  • 4,055
  • 7
  • 25
  • 39
78
votes
2 answers

Using data.table package inside my own package

I am trying to use the data.table package inside my own package. MWE is as follows: I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code…
ruser
  • 1,419
  • 2
  • 12
  • 11
75
votes
8 answers

How to get week numbers from dates?

Looking for a function in R to convert dates into week numbers (of year) I went for week from package data.table. However, I observed some strange behaviour: > week("2014-03-16") # Sun, expecting 11 [1] 11 > week("2014-03-17") # Mon, expecting…
Christian Borck
  • 1,652
  • 1
  • 10
  • 17
75
votes
5 answers

Add multiple columns to R data.table in one function call?

I have a function that returns two values in a list. Both values need to be added to a data.table in two new columns. Evaluation of the function is costly, so I would like to avoid having to compute the function twice. Here's the…
Florian Oswald
  • 4,694
  • 5
  • 26
  • 32
74
votes
12 answers

Error: package or namespace load failed for ggplot2 and for data.table

I am not able to open install the ggplot2 and data.table packages. It gives me the following error (example for ggplot2) > library(ggplot2) Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no…
elisahmendes
  • 1,969
  • 3
  • 11
  • 8
73
votes
1 answer

What you can do with a data.frame that you can't with a data.table?

I just started using R, and came across data.table. I found it brilliant. A very naive question: Can I ignore data.frame to use data.table to avoid syntax confusion between two packages?
AdamNYC
  • 17,771
  • 28
  • 90
  • 147
73
votes
2 answers

How to group data.table by multiple columns?

I'm using the data.table package to speed up some summary statistic collection on a data set. I'm curious if there's a way to group by more than one column. My data looks like this: purchaseAmt adShown url 15.54 00001 …
screechOwl
  • 23,958
  • 54
  • 146
  • 246
72
votes
5 answers

How to create a lag variable within each group?

I have a data.table: set.seed(1) data <- data.table(time = c(1:3, 1:4), groups = c(rep(c("b", "a"), c(3, 4))), value = rnorm(7)) data # groups time value # 1: b 1 -0.6264538 # 2: b 2 …
xiaodai
  • 11,863
  • 14
  • 63
  • 97
68
votes
1 answer

.EACHI in data.table?

I cannot seem to find any documentation on what exactly .EACHI does in data.table. I see a brief mention of it in the documentation: Aggregation for a subset of known groups is particularly efficient when passing those groups in i and setting…
Alex
  • 17,745
  • 33
  • 112
  • 182
67
votes
4 answers

Proper/fastest way to reshape a data.table

I have a data table in R: library(data.table) set.seed(1234) DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12)) DT x y v [1,] 1 A 12 [2,] 1 B 62 [3,] 1 A 60 [4,] 1 B 61 [5,] 2 A 83 [6,] 2 B 97 [7,] 2 A 1 [8,]…
Zach
  • 27,553
  • 31
  • 130
  • 193
65
votes
1 answer

Summarizing multiple columns with data.table

I'm trying to use data.table to speed up processing of a large data.frame (300k x 60) made of several smaller merged data.frames. I'm new to data.table. The code so far is as follows library(data.table) a =…
Tahnoon Pasha
  • 5,196
  • 12
  • 42
  • 71
64
votes
1 answer

Subset rows corresponding to max value by group using data.table

Assume I have a data.table containing some baseball players: library(plyr) library(data.table) bdt <- as.data.table(baseball) For each group (given by player 'id'), I want to select rows corresponding to the maximum number of games 'g'. This is…
hadley
  • 94,313
  • 27
  • 170
  • 239