Questions tagged [data.table]

The R data.table package is an extension of data.frame built for fast in-memory data analysis. Use the dt tag for the DataTables package with Shiny (DT).

's data.table package provides an enhanced version of data.frame including fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast overlapping range joins, fast add/modify/delete of columns by reference by group using no copies at all, and a fast file reader: fread. It has a natural syntax: DT[where|order, select|update, by]. SQL-inspired syntax enables joins within [] by using on to specify matching columns. These queries can be chained together just by adding another one on the end: DT[...][...].

The aggregation features are analogous to stats::ave, plyr::ddply, dplyr::group_by and Python's pandas, but faster.

Repositories

Detailed HTML vignettes

Other vignettes to follow, see here and feel free to voice support for your most-wanted!

Other resources

Other operations to be benchmarked.

Related tags

11627 questions
5
votes
2 answers

Including all permutations when using data.table[,,by=...]

I have a large data.table that I am collapsing to the month level using ,by. There are 5 by vars, with # of levels: c(4,3,106,3,1380). The 106 is months, the 1380 is a geographic unit. As in turns out there are some 0's, in that some cells have no…
Ari B. Friedman
  • 66,857
  • 33
  • 169
  • 226
5
votes
1 answer

data.table: vector scan v binary search with numeric columns - super-slow setkey

I am trying to find the quickest way to subset a large dataset by several numeric columns. As promised by data.table, the time taken to do binary search is much quicker than for vector scanning. Binary search, however, requires setkey to be…
mww
  • 101
  • 5
5
votes
1 answer

Transposing a data.table

What would be a good way to efficiently transform a data.table after the data computation is over nrow=500e3 ncol=2000 m <- matrix(rnorm(nrow*ncol),nrow=nrow) colnames(m) <- c('foo',seq(ncol-1)) dt <- data.table(m) df <- as.data.frame(m) dt <- t(dt)…
Abhi
  • 5,303
  • 10
  • 31
  • 55
5
votes
2 answers

How does data.table get the column name from j?

For example: dt <- data.table() x=1:5 > dt[,list(2,3,x)] V1 V2 x 1: 2 3 1 2: 2 3 2 3: 2 3 3 4: 2 3 4 5: 2 3 5 The resulting data.table has column x For some reason, I would like to create a function to simplify data.table…
colinfang
  • 17,887
  • 11
  • 69
  • 146
5
votes
2 answers

How to update existing column values in data.table?

I just started this programming, apologize for asking this simple question but I am stuck. I have a data.table called s3: s3: ClaimID dx dxgroup 15nhbfcgcda 113.8 NA 15nhbfcgcda 156.8 NA 15nhbfcgcda 110.8 …
n.datascience
  • 51
  • 1
  • 3
5
votes
1 answer

Pass argument to data.table aggregation function

I have a function that calculates a weighted mean of a variable and groups it by time period using the data.table aggregation syntax. However, I want to provide the name of the weighting column programmatically. Is there a way to accomplish this…
Abiel
  • 4,595
  • 6
  • 47
  • 66
5
votes
2 answers

R: Merge data.table and fill in NAs

Suppose 3 data tables: dt1<-data.table(Type=c("a","b"),x=1:2) dt2<-data.table(Type=c("a","b"),y=3:4) dt3<-data.table(Type=c("c","d"),z=3:4) I want to merge them into 1 data table, so I do this: dt4<-merge(dt1,dt2,by="Type") # No error, produces…
Wet Feet
  • 4,055
  • 7
  • 25
  • 39
5
votes
2 answers

R data.table doing an inner join on a field and operating on another?

I have the following scenario, I first create a data table as shown below x = data.table(f1 = c('a','b','c','d')) x = x[,rn := .I] This yields > x f1 rn 1: a 1 2: b 2 3: c 3 4: d 4 > Where rn is simply the row number. Now, I have…
broccoli
  • 4,316
  • 8
  • 33
  • 50
5
votes
1 answer

Using data.table to aggregate

After multiple suggestions from SO users, I am finally trying to convert my code over to using data.table. library(data.table) DT <- data.table(plate = paste0("plate",rep(1:2,each=5)), id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2), …
dayne
  • 6,774
  • 4
  • 31
  • 49
5
votes
1 answer

R: Subsetting a data.table with repeated column names with numerical positions

I have a data.table that looks like this > dput(DT) A B C A B C D 1: 1 2 3 3 5 6 7 2: 2 1 3 2 1 3 4 Here's the dput DT <- structure(list(A = 1:2, B = c(2L, 1L), C = c(3L, 3L), A = c(3L, 2L), B = c(5L, 1L), C = c(6L, 3L), D = c(7L, 4L)),…
Wet Feet
  • 4,055
  • 7
  • 25
  • 39
5
votes
3 answers

Change data.table values in one column for multiple rows

I am trying to change the values of one column for specific rows in a data.table. This works when I do a vector scan but not when I do a binary search. dtData <- data.table(TickerId = c(1,2,3,4,5), DateTime = c(1,2,3,4,5), Close = …
Wolfgang Wu
  • 804
  • 5
  • 14
5
votes
1 answer

data.table and Auto-complete Compatibility

Compare this behaviour, df <- data.frame(a11111 = rnorm(5,0), b11111= rnorm(5,0)) df$a # pressing tab at this instance auto-completes a11111 df$a # hitting return at this instance returns the value for a11111 with this…
TheComeOnMan
  • 11,085
  • 6
  • 35
  • 50
5
votes
1 answer

rbindlist data.tables with different number of columns

I am wondering how do I rbindlist data tables with different number of columns, and filling up empty rows with NAs like rbind.fill DT1 <- data.table(A = 1:3) DT2 <- data.table(A =4:5, B = letters[4:5]) l <- list(DT1, DT2) rbindlist(l) # Error…
Wet Feet
  • 4,055
  • 7
  • 25
  • 39
5
votes
3 answers

data.table loses factor ordering after rbind, R

When rbinding two data.table with ordered factors, the ordering seems to be lost: dtb1 = data.table(id = factor(c("a", "b"), levels = c("a", "c", "b"), ordered=T), key="id") dtb2 = data.table(id = factor(c("c"), levels = c("a", "c", "b"),…
Alex
  • 17,745
  • 33
  • 112
  • 182
5
votes
1 answer

Can I make this dplyr + data.table task faster?

I guess this is more of a dplyr than plyr question. For the sake of speed I am using data.table in some code I have written. During an intermediate step I have a table with some genomics data with ~32,000 rows: > bedbin.dt Source: local data table…
Stephen Henderson
  • 5,800
  • 3
  • 23
  • 33
1 2 3
99
100