216

To get rid of a column named "foo" in a data.frame, I can do:

df <- df[-grep('foo', colnames(df))]

However, once df is converted to a data.table object, there is no way to just remove a column.

Example:

df <- data.frame(id = 1:100, foo = rnorm(100))
df2 <- df[-grep('foo', colnames(df))] # works
df3 <- data.table(df)
df3[-grep('foo', colnames(df3))] 

But once it is converted to a data.table object, this no longer works.

Henrik
  • 56,228
  • 12
  • 124
  • 139
Maiasaura
  • 29,590
  • 23
  • 96
  • 103

8 Answers8

310

Any of the following will remove column foo from the data.table df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]

df3[, c("foo","bar"):=NULL]  # remove two columns

myVar = "foo"
df3[, (myVar):=NULL]   # lookup myVar contents

# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]

# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

## Method 3 (could then assign to df3, 
df3[, !"foo"]  

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]

Lastly there are approaches using with=FALSE, though data.table is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:

# Method 5a (like Method 3)
df3[, !"foo", with=FALSE] 
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]
MichaelChirico
  • 31,197
  • 13
  • 98
  • 169
Josh O'Brien
  • 148,908
  • 25
  • 332
  • 435
  • 2
    See my comment to the OP regarding `-grep` versus `!grepl`. – Joshua Ulrich Feb 08 '12 at 22:36
  • 1
    @JoshuaUlrich -- Good point. I tried `grepl()` initally and it didn't work, as data.table columns can't be indexed by a logical vector. But I now realize that `grepl()` can be made to work by wrapping it with `which()`, so that it returns an integer vector. – Josh O'Brien Feb 08 '12 at 23:38
  • 1
    I didn't know that about indexing with `data.table`, but wrapping it in `which` is clever! – Joshua Ulrich Feb 08 '12 at 23:59
  • 6
    I didn't know that about `data.table` either; added [FR#1797](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1797&group_id=240&atid=978). But, method 1 is (almost) _infinitely_ faster than the others. Method 1 removes the column by reference with no copy at all. I doubt you get it above 0.005 seconds for any size data.table. In contrast, the others might not work at all if the table is near 50% of RAM because they copy all but the one to delete. – Matt Dowle Feb 09 '12 at 09:27
  • +1 for Method 1 - that's the quick/easy solution I was looking for. – Robin Kraft Jan 03 '13 at 23:58
  • Since option 1 is preferred for programming, could it be rewritten using a text variable, such as afoo –  Oct 29 '14 at 12:57
  • One approach to program a data table column delete would be: afoo –  Oct 29 '14 at 13:17
  • 1
    @user3969377 if you want to remove a column based on the contents of a character variable you'd simply wrap it in parenthesis. Ie. df[,(afoo):=NULL] – Dean MacGregor Jul 09 '15 at 19:26
  • you could combine ``grepl`` with Ari Friedman's suggestion, no? e.g. ``cols – PatrickT Dec 19 '15 at 08:51
  • 1
    Note that we can't delete rows at the same time as columns: `df3[ 1:5, foo := NULL]` will give error: `When deleting columns, i should not be provided`. This works: `df3[ 1:5, -"foo", with = FALSE]`. – zx8754 Mar 17 '16 at 08:33
  • Method 1 also works for deleting multiple columns: `df3[, c("foo","bar"):=NULL]` – laxxy Oct 20 '16 at 17:26
  • Hi Josh. I just added common idioms to the top since I just saw this question is popular. Hope ok. Feel free to tidy up. It was just a quick edit. – Matt Dowle Nov 15 '16 at 02:47
  • @MattDowle Thanks. Since this looks like it has become the go-to answer for this very common question, do you think I should rewrite it, streamlining the text and better-organizing the various options? – Josh O'Brien Nov 15 '16 at 16:40
  • @JoshO'Brien To me it looks good as is. Reading my comment again I didn't intend to suggest a tidy up was needed. It's getting something right to be so highly voted so more authentic to leave it unchanged. – Matt Dowle Nov 15 '16 at 18:10
  • @MattDowle Great. Thanks for the feedback. – Josh O'Brien Nov 15 '16 at 18:23
  • 1
    The method 1 although spurts out a warning, `In [.data.table (dt, , ':='(col_to_be_deleted, NULL)) : Adding new column 'col_to_be_deleted' then assigning NULL (deleting it).` – Lazarus Thurston Jan 18 '18 at 16:08
  • @sanjmeh That's the message you get when you try to delete a column that is not already in your `data.table`. To see that, create the data.table `df3` as demonstrated in the OP, then compare the results of `df3[,foo:=NULL]` and `df3[,bar:=NULL]`. (And of course if you then try `df3[,foo:=NULL]` a second time, it will give you a similar warning since column `foo` has already been deleted.) See what I mean? – Josh O'Brien Jan 18 '18 at 17:03
  • And I realise I was making a mistake, these were indeed new names. So all clear. Method1 works well. – Lazarus Thurston Jan 18 '18 at 17:30
31

You can also use set for this, which avoids the overhead of [.data.table in loops:

dt <- data.table( a=letters, b=LETTERS, c=seq(26), d=letters, e=letters )
set( dt, j=c(1L,3L,5L), value=NULL )
> dt[1:5]
   b d
1: A a
2: B b
3: C c
4: D d
5: E e

If you want to do it by column name, which(colnames(dt) %in% c("a","c","e")) should work for j.

SeGa
  • 8,183
  • 3
  • 22
  • 54
Ari B. Friedman
  • 66,857
  • 33
  • 169
  • 226
  • 2
    In `data.table` 1.11.8, if you want to do it by column name, you can do directly `rm.col = c("a","b")` and `dt[, (rm.col):=NULL]` – Duccio A Dec 10 '18 at 11:08
20

I simply do it in the data frame kind of way:

DT$col = NULL

Works fast and as far as I could see doesn't cause any problems.

UPDATE: not the best method if your DT is very large, as using the $<- operator will lead to object copying. So better use:

DT[, col:=NULL]
msp
  • 1,009
  • 1
  • 9
  • 17
10

Very simple option in case you have many individual columns to delete in a data table and you want to avoid typing in all column names #careadviced

dt <- dt[, -c(1,4,6,17,83,104)]

This will remove columns based on column number instead.

It's obviously not as efficient because it bypasses data.table advantages but if you're working with less than say 500,000 rows it works fine

MichaelChirico
  • 31,197
  • 13
  • 98
  • 169
SJDS
  • 1,181
  • 1
  • 15
  • 28
4

Suppose your dt has columns col1, col2, col3, col4, col5, coln.

To delete a subset of them:

vx <- as.character(bquote(c(col1, col2, col3, coln)))[-1]
DT[, paste0(vx):=NULL]
iled
  • 2,033
  • 2
  • 27
  • 42
-2

Here is a way when you want to set a # of columns to NULL given their column names a function for your usage :)

deleteColsFromDataTable <- function (train, toDeleteColNames) {

       for (myNm in toDeleteColNames)

       train <- train [,(myNm):=NULL]

       return (train)
}
Yan Foto
  • 8,951
  • 4
  • 45
  • 79
-3
DT[,c:=NULL] # remove column c
Serjik
  • 9,013
  • 6
  • 57
  • 65
Durga Gaddam
  • 137
  • 2
  • 11
-6

For a data.table, assigning the column to NULL removes it:

DT[,c("col1", "col1", "col2", "col2")] <- NULL
^
|---- Notice the extra comma if DT is a data.table

... which is the equivalent of:

DT$col1 <- NULL
DT$col2 <- NULL
DT$col3 <- NULL
DT$col4 <- NULL

The equivalent for a data.frame is:

DF[c("col1", "col1", "col2", "col2")] <- NULL
      ^
      |---- Notice the missing comma if DF is a data.frame

Q. Why is there a comma in the version for data.table, and no comma in the version for data.frame?

A. As data.frames are stored as a list of columns, you can skip the comma. You could also add it in, however then you will need to assign them to a list of NULLs, DF[, c("col1", "col2", "col3")] <- list(NULL).

Contango
  • 65,385
  • 53
  • 229
  • 279
  • @Arun I can't think of any situation with `data.frames` where the row and columns would be switched. That would be illogical. – duHaas Mar 31 '14 at 22:42
  • @Arun I tagged you because your first comment made it seem like there were times at which you might call `DF[column,row]` so I just wanted to see if there actually were any instances where this happened. – duHaas Mar 31 '14 at 22:57
  • Updated the answer to remove a typo. – Contango Apr 02 '14 at 07:30