4

I am trying to read this .csv file into R. When I use read.csv, I either get errors related to row.names, or the column names are offset from their original columns. Based on this post I believe the problem is related to having an extra comma at the end of each line. What I can't find in the response to the previous question is how to get rid of the line ending commas.

My work around is to do the following:

pmr <-read.csv("pubmed_result.csv", header = T, row.names = NULL)
colnames(pmr) <- c(colnames(pmr)[2:ncol(pmr)], "blank")
pmr <- pmr[1:ncol(pmr)-1]

This provides the desired result, but seems a bit inelegant. Is there a way to get read.csv or read.table to ignore the last comma? Or is there a way to use gsub to fix the csv?

Josh
  • 998
  • 7
  • 22

1 Answers1

4

You are correct in your assessment that the trailing "," is causing the issues. To be precise, it's the fact that you have a trailing "," in the data lines but not in the line where the column names are declared.

If you don't want to manually fix the issue like you do in your code above, you could use readr::read_csv

library(tidyverse);
df <- read_csv("pubmed_result.csv");
df;
    ## A tibble: 375 x 11
#   Title   URL    Description  Details  ShortDetails Resource Type  Identifiers
#   <chr>   <chr>  <chr>        <chr>    <chr>        <chr>    <chr> <chr>
# 1 Myoedi… /pubm… Zhang Y, Lo… Physiol… Physiol Rev… PubMed   cita… PMID:29717…
# 2 Cullin… /pubm… Papizan JB,… J Biol … J Biol Chem… PubMed   cita… PMID:29653…
# 3 Fusoge… /pubm… Bi P, McAna… Proc Na… Proc Natl A… PubMed   cita… PMID:29581…
# 4 Correc… /pubm… Long C, Li … Sci Adv… Sci Adv.  2… PubMed   cita… PMID:29404…
# 5 Single… /pubm… Amoasii L, … Sci Tra… Sci Transl … PubMed   cita… PMID:29187…
# 6 Requir… /pubm… Shi J, Bi P… Proc Na… Proc Natl A… PubMed   cita… PMID:29078…
# 7 Consid… /pubm… Carroll KJ,… Circ Re… Circ Res.  … PubMed   cita… PMID:29074…
# 8 ZNF281… /pubm… Zhou H, Mor… Genes D… Genes Dev. … PubMed   cita… PMID:28982…
# 9 Functi… /pubm… Kyrychenko … JCI Ins… JCI Insight… PubMed   cita… PMID:28931…
#10 Defici… /pubm… Papizan JB,… J Clin … J Clin Inve… PubMed   cita… PMID:28872…
## ... with 365 more rows, and 3 more variables: Db <chr>, EntrezUID <int>,
##   Properties <chr>

This will throw a bunch of warnings which originate from the missing/additional trailing ",", which you can ignore in this case. Note that column names are correctly assigned.

Maurits Evers
  • 42,255
  • 4
  • 27
  • 51
  • tidyverse::read_csv worked perfectly. Thanks! Unfortunately, I don't have enough cred to up-vote your answer. – Josh Jun 22 '18 at 02:52
  • @Josh Great, glad it worked. You can close the question by setting the green check mark next to the answer. Good luck with your work! – Maurits Evers Jun 22 '18 at 03:27
  • thanks, I couldn't figure out how to mark the question as answered. – Josh Jun 22 '18 at 17:32