-1

So I have a data-frame structured as:

> head(peakQ)
  STATION_NUMBER DATA_TYPE YEAR PEAK_CODE PRECISION_CODE MONTH DAY HOUR MINUTE TIME_ZONE  PEAK SYMBOL
1        05EE006         Q 1983         H             NA     6  29    5     18       MST 1.980       
2        05EE006         Q 1985         H             NA     4   2    0      0       MST 1.380      B
3        05EE006         Q 1986         H             NA     3  30   13     37       MST 2.640       
4        05EE006         Q 1987         H             NA     4   5   21      2       MST 1.590      B
5        05EE006         Q 1989         H             NA    10  22    2     45       MST 0.473       
6        05EE006         Q 1990         H             NA     4   2    4      2       MST 1.470       

I want to drop the columns; STATION_NUMBER, DATA_TYPE, PEAK_CODE, PRECISION_CODE

But, I want to assume that I know only the column names and not their index.

I already know that it is trivial to use indexes, such as:

> head(peakQ[, -c(1, 2, 4, 5)])
  YEAR MONTH DAY HOUR MINUTE TIME_ZONE  PEAK SYMBOL
1 1983     6  29    5     18       MST 1.980       
2 1985     4   2    0      0       MST 1.380      B
3 1986     3  30   13     37       MST 2.640       
4 1987     4   5   21      2       MST 1.590      B
5 1989    10  22    2     45       MST 0.473       
6 1990     4   2    4      2       MST 1.470       

but, why do I get an error using column names? and, what is the workaround?

> head(peakQ[, -c("STATION_NUMBER", "DATA_TYPE", "PEAK_CODE", "PRECISION_CODE")])
Error in -c("STATION_NUMBER", "DATA_TYPE", "PEAK_CODE", "PRECISION_CODE") : 
  invalid argument to unary operator

I am especially confused because the opposite operation works just fine.

> head(peakQ[, c("STATION_NUMBER", "DATA_TYPE", "PEAK_CODE", "PRECISION_CODE")])
  STATION_NUMBER DATA_TYPE PEAK_CODE PRECISION_CODE
1        05EE006         Q         H             NA
2        05EE006         Q         H             NA
3        05EE006         Q         H             NA
4        05EE006         Q         H             NA
5        05EE006         Q         H             NA
6        05EE006         Q         H             NA

Any help and/or a deeper explanation is appreciated.

Cuylar Conly
  • 385
  • 1
  • 4
  • 11

2 Answers2

3

There is no minus operator on character vectors; however, subset tries to simulate this using a vector of unevaluated names. Ditto for dplyr select. We could also use setdiff which avoids the need for a minus operator.

1) subset Try subset with the select= argument:

subset(peakQ, select = - c(STATION_NUMBER, DATA_TYPE, PEAK_CODE, PRECISION_CODE))

2) setdiff Another possibility is:

peakQ[setdiff(names(peakQ), c("STATION_NUMBER","DATA_TYPE","PEAK_CODE","PRECISION_CODE"))]

3) dplyr The dplyr package's select could also be used:

library(dplyr)
peakQ %>%
      select(- c(STATION_NUMBER, DATA_TYPE, PEAK_CODE, PRECISION_CODE))
G. Grothendieck
  • 211,268
  • 15
  • 177
  • 297
1

It seems that the "exclude" operator only works with indices and not column names. A remedy to overcome this problem might be to subset the column names with the %in% and ! operators:

> cols <- letters[1:5]
> cols
[1] "a" "b" "c" "d" "e"
> df1 <- as.data.frame(do.call(cbind, rep(list(1:5), 5)))
> names(df1) <- cols
> df1
  a b c d e
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
> df1[,-c("a","b")]
Error in -c("a", "b") : invalid argument to unary operator
> df1[,!names(df1) %in% c("a","b")]
  c d e
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
Serhat Cevikel
  • 670
  • 3
  • 11