11

Consider the following comma separated file. For simplicity let it contain one line:


'I am quoted','so, can use comma inside - it is not separator here','but can\'t use escaped quote :=('

If you try to read it with the command

table <- read.csv(filename, header=FALSE)

the line will be separated to 4 parts, because line contains 3 commas. In fact I want to read only 3 parts, one of which contains comma itself. There quote flag comes for help. I tried:

table <- read.csv(filename, header=FALSE, quote="'")

but that falls with error "incomplete final line found by readTableHeader on table". That happens because of odd (seven) number of quotes.

read.table() as well as scan() have parameter allowEscapes, but setting it to TRUE doesn't help. It is ok, cause from help(scan) you can read:

The escapes which are interpreted are the control characters ‘\a, \b, \f, \n, \r, \t, \v’, ... ... Any other escaped character is treated as itself, including backslash

Please suggest how would you read such quoted csv-files, containing escaped \' quotes.

smci
  • 26,085
  • 16
  • 96
  • 138
Max
  • 4,552
  • 4
  • 26
  • 32
  • I understand what you're trying to do but am confused why you'd use `read.csv()`: this isn't a CSV file, there aren't multiple columns, it's all just a block of text, albeit with quotes. Are you saying rows are separate or not, why not just use `readLines(...,n=1)`? You must mean it's multiline text containing escaped quotes. – smci Sep 22 '16 at 04:27
  • I've found this is really annoying. Write.table will output strings with quotes in them as escaped `\"` but read.table can't interpret these. Why write them in that format as default if R can't read it?! – jrubins Mar 21 '19 at 15:02

2 Answers2

5

One possibility is to use readLines() to get everything read in as is, and then proceed by replacing the quote character by something else, eg :

tt <- readLines("F:/temp/test.txt")
tt <- gsub("([^\\]|^)'","\\1\"",tt) # replace ' by "
tt <- gsub("\\\\","\\",tt) # get rid of the double escape due to readLines

This allows you to read the vector tt in using a textConnection

zz <- textConnection(tt)
read.csv(zz,header=F,quote="\"") # give text input
close(zz)

Not the most beautiful solution, but it works (provided you don't have a " character somewhere in the file off course...)

Joris Meys
  • 98,937
  • 27
  • 203
  • 258
  • @Marek : I'm not completely following. Where exactly should I replace that to get the correct output? – Joris Meys May 17 '11 at 15:37
  • 3
    I mean `tt Details -> quotes). – Marek May 17 '11 at 21:11
  • 1
    Obviosly this works for files quoted with `"`, too using `tt – jnas Jun 12 '14 at 11:32
  • 1
    Although this solution works for small files, for larger files it becomes very slow and uses lots of memory.A streaming solution (holding only one line of the file in memory before it goes to the table) would be better, but I do not (yet) know how do do that. – rakensi Sep 17 '14 at 08:16
  • 1
    @rakensi You can use readLines() to read in the file in chunks and process those chunks, but any solution in R that involves large files or datasets will become slow and will use a lot of memory. R isn't the most memory-friendly language by design. – Joris Meys Sep 18 '14 at 09:44
  • @Joris Meys: Thank you for your remark. I think a good solution would be to use a pipe connection, transforming each line before feading it into read.csv. Unfortunately, the pipe uses a shell script, which makes this dependent on the environment you run it in. A pipe that transforms each line with an R function would be better, but I have not seen anything like that. – rakensi Sep 19 '14 at 10:50
1

read_delim from package readr can handle escaped quotes, using the arguments escape_double and escape_backslash.

read_delim(file, delim=',', escape_double=FALSE, escape_backslash=TRUE, quote="'")

(Note older versions of readr do not support quoted newlines in CSV headers correctly: https://github.com/tidyverse/readr/issues/784)

qwr
  • 6,786
  • 3
  • 42
  • 72