4

I need to read a text file (tab-separated) that has some carriage returns inside some fields.

If I use read.table, it gives me an error:

line 6257 did not have 20 elements

If I use read.csv, it doesn't give an error, but creates a new line in that place, putting the next fields in the first fields of the new line.

How can I avoid this? I can't alter the file itself (the script is to be run elsewhere). Also the broken strings don't have quotation marks (no strings in the file have). One option would be to read the carriage return as a single space, or as \n, but how?

Rodrigo
  • 3,829
  • 6
  • 41
  • 72

1 Answers1

5

Use read.table instead of read.csv and set allowEscapes to TRUE.

read.table("your/path",sep=",",allowEscapes=TRUE)

I tested w/ the following:

  1. wrote a csv file in excel

contents of csv file:

1,df,3,"4 
"
df,"df
",3,a

result:

  V1   V2 V3   V4
1  1   df  3 4 \n
2 df df\n  3    a
jrdnmdhl
  • 1,685
  • 14
  • 26
  • 2
    allowEscapes would make sense if I had a \n in my data. I don't have a \n, I have real carriage returns (ASCII 13). – Rodrigo Jun 11 '15 at 15:16
  • 1
    I tested read.table and allowEscapes and it worked for actual returns (not \n). Made an example file using excel w/ line breaks inside the cells, saved as csv. It read properly with allowEscapes, but not with readcsv. – jrdnmdhl Jun 11 '15 at 15:23
  • 1
    OK, I could reproduce your example, but mine still doesn't work. The problem seem to be that my strings are not delimited by quotation marks " ". And I can't edit the original file. – Rodrigo Jun 11 '15 at 16:10
  • 3
    I see what you are saying.The problem is that if the line break isn't quoted, how can one tell which line breaks represent a jump to the next row and which are just the contents of cells. If you can come up with a pattern there, then you can write your own function to read in the csv as character and parse it via your own function. – jrdnmdhl Jun 11 '15 at 16:48
  • This is a problem only if the line break is on the last column. But yes, you are right. I think it's better to ask them to quote the files on the server. – Rodrigo Jun 11 '15 at 16:51
  • I could change the files, adding the quotation marks. Now allowEscapes isn't even necessary. Thank you. – Rodrigo Jun 11 '15 at 17:44