6

I have a csv that I would like to import into R as a data.frame. This csv has headers such as USD.ZeroCouponBondPrice(1m) and USD-EQ-SP500 that I can't change. When I try to import it into R, however, R's read.csv function overwrites the characters ()- as . Although I wasn't able to find a way to fix this in the function documentation, this line of code worked:

colnames(df)<-c('USD.ZeroCouponBondPrice(1m)', 'USD-EQ-SP500')

so those characters are legal in data.frame column names. Overwriting all of the column names is annoying and fragile as there are over 20 of them and it is not unthinkable for them to change. Is there a way to prevent read.csv from replacing those characters, or an alternative function to use?

Theaetetos
  • 75
  • 1
  • 9
  • I'm not sure how, but possible you could make some hack using `Tibbles`. With `Tibbles` you can [use _crazy name_](https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html) for the names of variables. – Eric Fail Oct 18 '17 at 16:24

2 Answers2

13

If you set the argument

check.names = FALSE

in read.csv, then R will not override the names. But these names are not valid in R and they'll have to be handled differently than valid names.

Kelli-Jean
  • 1,367
  • 6
  • 17
  • 1
    One example of "handled differently" is if you are using `$` notation to reference a variable you will need backticks around the variable name, e.g. `df$\`USD.ZeroCouponBondPrice(1m)\``. – Brian Stamper Oct 18 '17 at 18:17
-2

Illustrating a possible Tibbles solution utilizing Kelli-Jean's answer on how to use check.names = FALSE

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
dta <- url("http://s3.amazonaws.com/csvpastebin/uploads/a4c665743904ea8f18dd1f31edcbae04/crazy_names.csv")
TBdta <- as_tibble(read.csv(dta, check.names = FALSE)) 
TBdta
#> # A tibble: 6 x 3
#>   USD.ZeroCouponBondPrice(1m) USD-EQ-SP500 crazy name
#>                        <fctr>        <dbl>      <int>
#> 1                           A         10.0         12
#> 2                           A         11.0         14
#> 3                           B          5.0          8
#> 4                           B          6.0         10
#> 5                           A         10.5         13
#> 6                           B          7.0         11

Be sure to read this introduction to Tibbles as they do behave somewhat different from regular data frames.

In case someone need to use https

temporaryFile <- tempfile()
download.file("https://s3.amazonaws.com/csvpastebin/uploads/a4c665743904ea8f18dd1f31edcbae04/crazy_names.csv", destfile = temporaryFile, method="curl")
TBdta2 <- as_tibble(read.csv(temporaryFile, check.names = F)) 
Eric Fail
  • 7,222
  • 5
  • 61
  • 118
  • You can use invalid names for variables in a native data frame, as the result of `read.csv(dta, check.names = FALSE)` shows. The only difference I see with tibbles is it doesn't automatically convert names when you use the `tibble()` function to create one. I don't see any added benefit to wrapping `as_tibble()` around `read.csv()`, at least as far as the OP's question goes. – Brian Stamper Oct 18 '17 at 18:11
  • @BrianStamper I appreciate your feedback. – Eric Fail Oct 18 '17 at 18:16
  • 1
    I accepted @Kelli-Jean's answer because it was easier to implement as a solution, but I found this answer helpful as a legitimate alternative. I didn't specify that I wanted an answer that uses only R's base packages, so I don't think this answer deserves the down vote (not sure if it was you). – Theaetetos Oct 18 '17 at 22:10