R read.csv Importing Column Names Incorrectly

Question

I have a csv that I would like to import into R as a data.frame. This csv has headers such as USD.ZeroCouponBondPrice(1m) and USD-EQ-SP500 that I can't change. When I try to import it into R, however, R's read.csv function overwrites the characters ()- as . Although I wasn't able to find a way to fix this in the function documentation, this line of code worked:

colnames(df)<-c('USD.ZeroCouponBondPrice(1m)', 'USD-EQ-SP500')

so those characters are legal in data.frame column names. Overwriting all of the column names is annoying and fragile as there are over 20 of them and it is not unthinkable for them to change. Is there a way to prevent read.csv from replacing those characters, or an alternative function to use?

I'm not sure how, but possible you could make some hack using `Tibbles`. With `Tibbles` you can [use _crazy name_](https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html) for the names of variables. — Eric Fail, Oct 18 '17 at 16:24

score 13 · Accepted Answer · answered Oct 18 '17 at 16:30

13

If you set the argument

check.names = FALSE

in read.csv, then R will not override the names. But these names are not valid in R and they'll have to be handled differently than valid names.

answered Oct 18 '17 at 16:30

Kelli-Jean

1,367
6
17

1

One example of "handled differently" is if you are using `$` notation to reference a variable you will need backticks around the variable name, e.g. `df$\`USD.ZeroCouponBondPrice(1m)\``. – Brian Stamper Oct 18 '17 at 18:17

score -2 · Answer 2 · answered Oct 18 '17 at 16:47

Illustrating a possible Tibbles solution utilizing Kelli-Jean's answer on how to use check.names = FALSE

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
dta <- url("http://s3.amazonaws.com/csvpastebin/uploads/a4c665743904ea8f18dd1f31edcbae04/crazy_names.csv")
TBdta <- as_tibble(read.csv(dta, check.names = FALSE)) 
TBdta
#> # A tibble: 6 x 3
#>   USD.ZeroCouponBondPrice(1m) USD-EQ-SP500 crazy name
#>                        <fctr>        <dbl>      <int>
#> 1                           A         10.0         12
#> 2                           A         11.0         14
#> 3                           B          5.0          8
#> 4                           B          6.0         10
#> 5                           A         10.5         13
#> 6                           B          7.0         11

Be sure to read this introduction to Tibbles as they do behave somewhat different from regular data frames.

In case someone need to use https

temporaryFile <- tempfile()
download.file("https://s3.amazonaws.com/csvpastebin/uploads/a4c665743904ea8f18dd1f31edcbae04/crazy_names.csv", destfile = temporaryFile, method="curl")
TBdta2 <- as_tibble(read.csv(temporaryFile, check.names = F))

You can use invalid names for variables in a native data frame, as the result of `read.csv(dta, check.names = FALSE)` shows. The only difference I see with tibbles is it doesn't automatically convert names when you use the `tibble()` function to create one. I don't see any added benefit to wrapping `as_tibble()` around `read.csv()`, at least as far as the OP's question goes. — Brian Stamper, Oct 18 '17 at 18:11
I accepted @Kelli-Jean's answer because it was easier to implement as a solution, but I found this answer helpful as a legitimate alternative. I didn't specify that I wanted an answer that uses only R's base packages, so I don't think this answer deserves the down vote (not sure if it was you). — Theaetetos, Oct 18 '17 at 22:10

R read.csv Importing Column Names Incorrectly

2 Answers2