22

For example, with the data set mtcars

mtcars[ , "cyl"]

and

mtcars[ , 2]

both give me the same column. So, since I can get everything BUT the column 2 like this:

mtcars[ , -2]

I don't expect this:

mtcars[ , -"cyl"]
Error in -"cyl" : invalid argument to unary operator

instead the best I can come up with is this:

mtcars[ , !colnames(mtcars)=="cyl"]

Is there an easier solution?

EDIT: It seems logical that if the first two techniques work, so should the second two techniques. I hoped I was missing something. The help pages for ?"[" or ?subset don't explain this counterintuitive result. Does anybody know why this is?

J. Win.
  • 6,122
  • 5
  • 31
  • 52
  • @Joshua I think these are a bit different, here the focus is on deleting a single column by name, where the usual conventions don't work. – Gavin Simpson Jan 29 '11 at 15:11
  • I made a slight change to the title and added "why" to the question. – J. Win. Jan 29 '11 at 15:21
  • The rules are explained in section 2.7 of the Introduction to R manual: http://cran.r-project.org/doc/manuals/R-intro.html#Index-vectors – Gavin Simpson Jan 29 '11 at 15:22
  • @Gavin: I agree, I misunderstood the question. @jonw: I don't understand why you think the result from `mtcars[,-"cyl"]` is counterintuitive. `?"-"` indicates that the `-` operator works on numeric or complex vectors (or objects which can be coerced to them). What do you expect the result of `-"cyl"` to be? – Joshua Ulrich Jan 29 '11 at 15:58
  • @jonw: I understand what you are saying. I'm asking what you think R should return when you type `-"cyl"` at the command line. What does `-"char"` mean? – Joshua Ulrich Jan 29 '11 at 16:07
  • @Joshua: Typing `"cyl"` or `"-cyl"` at the command line doesn't return anything useful anymore than `2` or `-2`, but while the latter can both be used in brackets with the dataframe name as a metaphor to get a certain result, the metaphor breaks down when I try to extend it to the former use... Why is it implemented this way? Maybe the answer is that `-"cyl"` (or `!"cyl"`) only makes sense to me :) – J. Win. Jan 29 '11 at 16:16
  • @jonw `-()` is a function and the R developers say it can't be used on a character vector (and not just because negating a string doesn't make sense). Because you can't negate a character vector, you can't supply negative strings to drop columns. The problem is with `-` and is the source of the error message you quote. Hence the rule that negative indices only work for numerics. Try `-"cyl"` not what you did to see the same error you got when subsetting. – Gavin Simpson Jan 29 '11 at 16:56
  • 2
    @jonw: `"-cyl"` and `-"cyl"` are different things and you were asking about the latter, not the former (which is a string). In order to use the negative subscripting metaphor with strings, you first must define what a negative string means. I'm not aware of a language that defines what a negative string means; even if one did, it would be an idiosyncratic definition because `-"char"` is not well-defined like `-2`. – Joshua Ulrich Jan 29 '11 at 17:07
  • Makes sense. The last two comments are the explanation I was after. Any way to promote them to an answer for the checkmark? – J. Win. Jan 30 '11 at 08:24
  • @jonw for what it is worth, I added my comment to my answer. – Gavin Simpson Jan 30 '11 at 10:54

2 Answers2

27

[Edit:] Explanation of why negative string indices does not work:

-() is a function and the R developers say it can't be used on a character vector (and not just because negating a string doesn't make sense). Because you can't negate a character vector, you can't supply negative strings to drop columns. The problem is with - and is the source of the error message you quote. Hence the rule that negative indices only work for numerics. The source of the original error is:

> -"cyl"
Error in -"cyl" : invalid argument to unary operator

Note that in the comments to the Q, there was confusion that the negative version of "cyl" was "-cyl", which it isn't, it is just another string. The R snippet above shows what was happening in the subsetting tried in the Question.

Section 2.7 of the "An Introduction to R" manual describes the allowed methods of subsetting.

[Original:] The simplest way to remove a component is just to set that component to NULL:

> cars <- mtcars
> cars[, "cyl"] <- NULL ## or cars$cyl <- NULL
> names(cars)
 [1] "mpg"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

[Edit:] In light of the Edit to the Q indicating a temporary drop of the named column was desired, then:

subset(mtcars, select = -cyl)

or

mtcars[, !names(mtcars) %in% "cyl"]

are options, and the former cleaner than the latter.

Gavin Simpson
  • 157,540
  • 25
  • 364
  • 424
12

I often use subset. An example using mtcars

> names(mtcars)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
> cars <- subset(mtcars, select=-c(mpg,cyl))
> names(cars)
[1] "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

There are some other ideas in the answers to this question.

Update: Subset also works for temporary removal of one or more columns by name, just replace mtcars[,-2] with subset(mtcars, select=-cyl).

Community
  • 1
  • 1
  • Thanks Seth, I never explored the options of subset and this will be a useful tool. – J. Win. Jan 30 '11 at 18:08
  • Actually the *select* arg is limited, it cannot be a negative character vector as @Gavin-Simpson notes above. So the only negative arguments you can get into it are a single character variable, or else a fixed list of string constants, like `select=-c(mpg,cyl)` . – smci Feb 09 '14 at 13:18