2

Rookie question - thanks in advance for patience...

I have a dataframe:

vals <- c(1,1,1,1)
testdf <- data.frame("var1"=vals, "var2"=vals, "var3"=vals)

I have a character vector of variable names:

varnames <- c("var1", "var2")

This is a character vector b/c I use it to generate a formula earlier in the script.

I'd like to subset a dataframe such that variables in varnames are excluded, e.g.

newDF <- subset(df, select=-varnames)

This creates an error since subset expects names instead of characters. So, I use lapply to change the characters to names:

varnames <- lapply(varnames, as.name)

The result of this lapply function is a named(?) and nested(?) list.

[[1]]
var1

[[2]]
var2

[[3]]
var3

Here's where I get lost (I feel like Mugatu on crazy pills... is this confusing to anyone else!?). I can see that each value has correctly been changed from character to name, but it's in this weird nested structure - so when I try to subset, I get an error.

I've tried various solutions to unnest and unname, but with no success. This must be something easy I'm missing.

As a bonus - can someone tell me why it is ever useful for lapply to return this nested named list instead of simple vector? It seems very different than, for instance, Python. Thank you.

Rocinante
  • 555
  • 4
  • 14
  • Maybe also see http://stackoverflow.com/q/7072159/ and http://stackoverflow.com/q/6286313/ – Frank May 05 '16 at 11:41

3 Answers3

6

You can define the names of the columns you want inside [ (see the help file ?Extract or help("[") for the subset operator [).

testdf[ names(testdf)[!names(testdf) %in% varnames] ]
## or
## testdf[, names(testdf)[!names(testdf) %in% varnames] , drop = FALSE]

Or, more concisely (thanks @Frank)

testdf[ setdiff(names(testdf), varnames)]
  var3
1    1
2    1
3    1
4    1

where

names(testdf)
# [1] "var1" "var2" "var3"
varnames
# [1] "var1" "var2"

And So

names(testdf) %in% varnames
# [1]  TRUE  TRUE FALSE

And therefore

names(testdf)[!names(testdf) %in% varnames]
# [1] "var3"

Which is the same as

testdf[, "var3" ]

And drop = FALSE to stop it 'dropping' to a vector if there's only one column returned.


Also, if you look at the help file for lapply(X, FUN, ...)

?lapply

lapply returns a list of the same length as X

This is why you're getting a list.


As a bonus - can someone tell me why it is ever useful for lapply to return this nested named list instead of simple vector? It seems very different than, for instance, Python. Thank you.

When you're working with a list, and you want it to remain as a list.

SymbolixAU
  • 22,021
  • 4
  • 47
  • 120
1

You can also use match which returns an index

testdf[-match(varnames,names(testdf))]


#   var3
#1    1
#2    1
#3    1
#4    1
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
0

You can access the elements using varnames[[1]] etc. and convert it into a vector, if it makes it easier for you.

Source: https://www.datacamp.com/community/tutorials/r-tutorial-apply-family

lapply takes a list and applies the function to every element of the list. The list can also have another list as an element. So it takes that into consideration and returns that nested structure.

mastov
  • 2,768
  • 1
  • 14
  • 30
NinComPoop
  • 79
  • 1
  • 2