3

This is almost a duplicate of this. I want to drop columns from a data table, but I want to do it efficiently. I have a list of names of columns that I want to keep. All the answers to the linked question imply doing something akin to

data.table.new <- data.table.old[, my.list]

which at some crucial point will give me a new object, while the old object is still in memory. However, my data.table.old is huge, and hence I prefer to do this via reference, as suggested here

set(data.table.old, j = 'a', value = NULL)

However, as I have a whitelist of columns, and not a blacklist, I would need to iterate through all the column names, checks whether they are in my.list, and then apply set(). Is there any cleaner/other way to doing so?

Community
  • 1
  • 1
FooBar
  • 13,456
  • 10
  • 65
  • 140
  • 3
    use the `setdiff` of the variable names? – rawr Jul 07 '15 at 11:34
  • possible duplicate of [Idiom for dropping a single column in a data.table](http://stackoverflow.com/questions/16473304/idiom-for-dropping-a-single-column-in-a-data-table) mnel's answer there is easily extended to the drop-more-than-one-column case (with `toDrop` analogous to `dropcols` in Jan's answer). – Frank Jul 07 '15 at 16:55

1 Answers1

7

Not sure if you can do by reference ops on data.frame without making it data.table.
Below code should works if you consider to use data.table.

library(data.table)
setDT(data.frame.old)
dropcols <- names(data.frame.old)[!names(data.frame.old) %in% my.list]
data.frame.old[, c(dropcols) := NULL]
jangorecki
  • 14,077
  • 3
  • 57
  • 137
  • I think `setdiff` is better for construction of `dropcols` – Frank Jul 07 '15 at 16:52
  • @Frank Very likely, but I personally used to the `not in` vector subset – jangorecki Jul 07 '15 at 19:21
  • Why doesn't `dropcols := NULL` work? What is the `c` doing? – Farrel Aug 31 '15 at 22:34
  • 1
    @Farrel it evaluates `dropcols` to the character vector. Your code wouldn’t work because it would try to remove column named "dropcols" instead of the column names which `dropcols` variables stores. It doesn't need to be `c(dropcols)`, it can be `(dropcols)` which also evaluates the variable. – jangorecki Sep 01 '15 at 00:05