Questions tagged [data.table]

The R data.table package is an extension of data.frame built for fast in-memory data analysis. Use the dt tag for the DataTables package with Shiny (DT).

r's data.table package provides an enhanced version of data.frame including fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast overlapping range joins, fast add/modify/delete of columns by reference by group using no copies at all, and a fast file reader: fread. It has a natural syntax: DT[where|order, select|update, by]. SQL-inspired syntax enables joins within [] by using on to specify matching columns. These queries can be chained together just by adding another one on the end: DT[...][...].

The aggregation features are analogous to stats::ave, plyr::ddply, dplyr::group_by and Python's pandas, but faster.

Repositories

Detailed HTML vignettes

Other vignettes to follow, see here and feel free to voice support for your most-wanted!

Other resources

Other operations to be benchmarked.

Related tags

r's plyr and dplyr packages
python's pandas library

11627 questions

votes

10 answers

How to replace NA values in a table for selected columns

There are a lot of posts about replacing NA values. I am aware that one could replace NAs in the following table/frame with the following: x[is.na(x)]<-0 But, what if I want to restrict it to only certain columns? Let's me show you an…

r replace dataframe data.table na

asked Oct 15 '13 at 10:36

jnam27

1,187
1
11
16

votes

2 answers

Remove multiple columns from data.table

What's the correct way to remove multiple columns from a data.table? I'm currently using the code below, but was getting unexpected behavior when I accidentally repeated one of the column names. I wasn't sure if this was a bug, or if I shouldn't…

r data.table

asked May 19 '13 at 19:16

matt_k

3,429
4
24
32

votes

7 answers

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row. (1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't…

r dplyr data.table aggregate

asked Jun 05 '14 at 21:45

Ed Swindelles

votes

2 answers

Extract a column from a data.table as a vector, by position

How do I extract a column from a data.table as a vector by its position? Below are some code snippets I have tried: DT<-data.table(x=c(1,2),y=c(3,4),z=c(5,6)) DT # x y z #1: 1 3 5 #2: 2 4 6 I want to get this output using column position DT$y…

r vector indexing data.table

asked Nov 18 '13 at 08:37

Wet Feet

4,055
7
25
39

votes

2 answers

Using data.table package inside my own package

I am trying to use the data.table package inside my own package. MWE is as follows: I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code…

r data.table

asked May 10 '12 at 03:18

ruser

1,419
2
12
11

votes

8 answers

How to get week numbers from dates?

Looking for a function in R to convert dates into week numbers (of year) I went for week from package data.table. However, I observed some strange behaviour: > week("2014-03-16") # Sun, expecting 11 [1] 11 > week("2014-03-17") # Mon, expecting…

r date data.table week-number

asked Mar 16 '14 at 16:29

Christian Borck

1,652
1
10
17

votes

5 answers

Add multiple columns to R data.table in one function call?

I have a function that returns two values in a list. Both values need to be added to a data.table in two new columns. Evaluation of the function is costly, so I would like to avoid having to compute the function twice. Here's the…

r data.table

asked Jul 03 '12 at 10:13

Florian Oswald

4,694
5
26
32

votes

12 answers

Error: package or namespace load failed for ggplot2 and for data.table

I am not able to open install the ggplot2 and data.table packages. It gives me the following error (example for ggplot2) > library(ggplot2) Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no…

r ggplot2 data.table

asked Jul 30 '15 at 07:39

elisahmendes

1,969
3
11
8

votes

1 answer

What you can do with a data.frame that you can't with a data.table?

I just started using R, and came across data.table. I found it brilliant. A very naive question: Can I ignore data.frame to use data.table to avoid syntax confusion between two packages?

r dataframe data.table

asked Nov 29 '12 at 03:46

AdamNYC

17,771
28
90
147

votes

2 answers

How to group data.table by multiple columns?

I'm using the data.table package to speed up some summary statistic collection on a data set. I'm curious if there's a way to group by more than one column. My data looks like this: purchaseAmt adShown url 15.54 00001 …

r group-by data.table

asked Sep 18 '12 at 14:22

screechOwl

23,958
54
146
246

votes

5 answers

How to create a lag variable within each group?

I have a data.table: set.seed(1) data <- data.table(time = c(1:3, 1:4), groups = c(rep(c("b", "a"), c(3, 4))), value = rnorm(7)) data # groups time value # 1: b 1 -0.6264538 # 2: b 2 …

r data.table plyr dplyr

asked Oct 10 '14 at 04:33

xiaodai

11,863
14
63
97

votes

1 answer

.EACHI in data.table?

I cannot seem to find any documentation on what exactly .EACHI does in data.table. I see a brief mention of it in the documentation: Aggregation for a subset of known groups is particularly efficient when passing those groups in i and setting…

r performance group-by data.table

asked Nov 18 '14 at 21:03

Alex

17,745
33
112
182

votes

4 answers

Proper/fastest way to reshape a data.table

I have a data table in R: library(data.table) set.seed(1234) DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12)) DT x y v [1,] 1 A 12 [2,] 1 B 62 [3,] 1 A 60 [4,] 1 B 61 [5,] 2 A 83 [6,] 2 B 97 [7,] 2 A 1 [8,]…

r data.table

asked Aug 01 '11 at 17:27

Zach

27,553
31
130
193

votes

1 answer

Summarizing multiple columns with data.table

I'm trying to use data.table to speed up processing of a large data.frame (300k x 60) made of several smaller merged data.frames. I'm new to data.table. The code so far is as follows library(data.table) a =…

r data.table

asked May 13 '13 at 01:41

Tahnoon Pasha

5,196
12
42
71

votes

1 answer

Subset rows corresponding to max value by group using data.table

Assume I have a data.table containing some baseball players: library(plyr) library(data.table) bdt <- as.data.table(baseball) For each group (given by player 'id'), I want to select rows corresponding to the maximum number of games 'g'. This is…

r data.table greatest-n-per-group

asked May 15 '13 at 20:03

hadley

94,313
27
170
239

Prev 1 2

…

99 100 Next