Subset data frame based on character vector of column names

Question

Rookie question - thanks in advance for patience...

I have a dataframe:

vals <- c(1,1,1,1)
testdf <- data.frame("var1"=vals, "var2"=vals, "var3"=vals)

I have a character vector of variable names:

varnames <- c("var1", "var2")

This is a character vector b/c I use it to generate a formula earlier in the script.

I'd like to subset a dataframe such that variables in varnames are excluded, e.g.

newDF <- subset(df, select=-varnames)

This creates an error since subset expects names instead of characters. So, I use lapply to change the characters to names:

varnames <- lapply(varnames, as.name)

The result of this lapply function is a named(?) and nested(?) list.

[[1]]
var1

[[2]]
var2

[[3]]
var3

Here's where I get lost (I feel like Mugatu on crazy pills... is this confusing to anyone else!?). I can see that each value has correctly been changed from character to name, but it's in this weird nested structure - so when I try to subset, I get an error.

I've tried various solutions to unnest and unname, but with no success. This must be something easy I'm missing.

As a bonus - can someone tell me why it is ever useful for lapply to return this nested named list instead of simple vector? It seems very different than, for instance, Python. Thank you.

Maybe also see http://stackoverflow.com/q/7072159/ and http://stackoverflow.com/q/6286313/ — Frank, May 05 '16 at 11:41

SymbolixAU · Accepted Answer · 2016-05-05T05:58:34.600

You can define the names of the columns you want inside [ (see the help file ?Extract or help("[") for the subset operator [).

testdf[ names(testdf)[!names(testdf) %in% varnames] ]
## or
## testdf[, names(testdf)[!names(testdf) %in% varnames] , drop = FALSE]

Or, more concisely (thanks @Frank)

testdf[ setdiff(names(testdf), varnames)]
  var3
1    1
2    1
3    1
4    1

where

names(testdf)
# [1] "var1" "var2" "var3"
varnames
# [1] "var1" "var2"

And So

names(testdf) %in% varnames
# [1]  TRUE  TRUE FALSE

And therefore

names(testdf)[!names(testdf) %in% varnames]
# [1] "var3"

Which is the same as

testdf[, "var3" ]

And drop = FALSE to stop it 'dropping' to a vector if there's only one column returned.

Also, if you look at the help file for lapply(X, FUN, ...)

?lapply

lapply returns a list of the same length as X

This is why you're getting a list.

As a bonus - can someone tell me why it is ever useful for lapply to return this nested named list instead of simple vector? It seems very different than, for instance, Python. Thank you.

When you're working with a list, and you want it to remain as a list.

Fyi, `x[!(x %in% y)]` is `setdiff(x,y)` – Frank May 05 '16 at 04:59 — Frank, May 05 '16 at 04:59
Excellent! Works great. Thanks Symbolix and @Frank – Rocinante May 05 '16 at 05:43 — Rocinante, May 05 '16 at 05:43

score 1 · Answer 2 · answered May 05 '16 at 05:20

1

You can also use match which returns an index

testdf[-match(varnames,names(testdf))]


#   var3
#1    1
#2    1
#3    1
#4    1

answered May 05 '16 at 05:20

Ronak Shah

286,338
16
97
143

score 0 · Answer 3 · edited May 05 '16 at 13:13

0

You can access the elements using varnames[[1]] etc. and convert it into a vector, if it makes it easier for you.

Source: https://www.datacamp.com/community/tutorials/r-tutorial-apply-family

lapply takes a list and applies the function to every element of the list. The list can also have another list as an element. So it takes that into consideration and returns that nested structure.

edited May 05 '16 at 13:13

mastov

2,768
1
14
30

answered May 05 '16 at 04:47

NinComPoop

79
1
2

A better source would be `help("[[")`. Generally, links are not preferred on this site, since they may eventually break. – Frank May 05 '16 at 04:51
Hi. I am relatively new here. I didn't get you. What should I use instead of the link? – NinComPoop May 05 '16 at 04:53
1

Ohh you meant to suggest using help function in R. Got it :) – NinComPoop May 05 '16 at 04:56

Subset data frame based on character vector of column names

3 Answers3

Linked

Related