3

I am looking for a way to use the split-apply-combine strategy with R's data.table package.

library(data.table)

# take a data.table object, return integer
func <- function(DT) 
{
   DT$a * DT$a
}

DT = data.table(
  a = 1:50
  # ... further fields here
  b = rep(1:10, 5)
)

# this obviously won't work:
DT[, result:=func, by=b]

# but this will (based on @Aruns answer below)
DT[, result:=func(.SD), by=b]

While this here is very simple data.table, with more complicated structures, I'd like to be able to extract logic into functions and send subsets as data.tables to them, without having to enlist all field names.

nikola
  • 4,836
  • 3
  • 20
  • 18
  • @Arun, i've edited the question a bit, i was not aware of the `.SD` will be in a moment, though. – nikola Feb 23 '13 at 21:43

1 Answers1

9

Edit: Check out the more detailed HTML vignettes available on the project wiki of data.table.

Okay, let me show you a small comparison of plyr method using data.table to show the equivalence. Maybe that'll help to get you started. But it is important that you read this very nice introduction to data.table AND this FAQ.

set.seed(45) # for reproducibility
# dummy data
m  <- matrix(10*sample(15, 100, replace=T), ncol=10) # 100*10 matrix
df <- data.frame(grp = sample(1:10, 100, replace = T))
df <- cbind(df, as.data.frame(m))

You have a data.frame with 11 columns, 10 data and 1 grouping column. Now, if you'd want to take the mean of each of these columns within each group, then, using plyr, you'd do something like:

require(plyr)
ddply(df, .(grp), function(x) colMeans(x[, 2:11]))

Using data.table, you can use .SD (check this post for a nice explanation of what .SD is, in addition to reading the documentation links).

require(data.table)
dt <-data.table(df, key="grp")
dt[, lapply(.SD, mean), by=grp]

This should get you started, I think..?

Community
  • 1
  • 1
Arun
  • 108,644
  • 21
  • 263
  • 366
  • 2
    Hey - this is very nice, thanks alot! Basically everything I was looking for was the `.SD` feature. – nikola Feb 23 '13 at 21:55