TL;DR
I have a snippet of text
str <- '"foo\\dar embedded \\\"quote\\\""'
# cat(str, '\n') # gives
# "foo\dar embedded \"quote\""
# i.e. as if the above had been written to a CSV with quoting turned on.
I want to end up with the string:
str <- 'foo\\dar embedded "quote"'
# cat(str, '\n') # gives
# foo\dar embedded "quote"
essentially removing one "layer" of quoting. How may I do this?
(Initial attempt -- eval(parse(text=str))
, which works unless you have something like \\dar
, where you get the error "\d
is an unrecognized escape in character string ...").
Gory details (optional)
The reason my strings are quoted once-too-many times is I kludged some data processing -- I wrote str
(well, a dataframe in my case) to a table with quoting enabled, but forgot that many of the columns in my dataframe had embedded newlines with embedded quotes (i.e. forgot to escape/remove them).
It turns out that when I read.table
a file with multiple columns in the same row that have embedded newlines and embedded quotes (or something like that), the function fails (fair enough).
I had since closed my R session so my only access to my data was through my munged CSV. So I wrote some spaghetti code to simply readLines
my CSV and split everything up to reconstruct my dataframe again. However, since all my character columns were quoted in the CSV, I have a few columns in my restored dataframe that are still quoted that I want to unquote.
Messy, I know. I'll remember to save an original version of the data next time (save
, saveRDS
).
For those interested, the header row and three rows of my CSV are shown below (all the characters are ASCII)
"quote";"id";"date";"author";"context"
"< mwk> I tried to fix the bug I mentioned, but I accidentally ascended the character I started for testing... hoped she'd die soon and I could get to coding, but alas I was wrong";"< mwk> I tried to fix the bug I mentioned, but I accidentally ascended the character I started for testing... hoped she'd die soon and I could get to coding, but alas I was wrong";"February 28, 2013";"nhqdb";"nhqdb"
"< intx14> \"A gush of water hits the air elemental on the central core!\"
< intx14> What is this, a weather forecast?";"< intx14> \"A gush of water hits the air elemental on the central core!\"
< intx14> What is this, a weather forecast?";"February 28, 2013";"nhqdb";"nhqdb"
"< bcode> n - a spherical amulet. You are lucky! Full moon tonight.
< bcode> That must be a sign - I'll put it on! What could possibly go wrong...
< oracle\devnull> DIED : bcode2 (Wiz-Elf-Mal-Cha) 0 points, killed by strangulation on pcs1.nethack.devnull.net";"< bcode> n - a spherical amulet. You are lucky! Full moon tonight.
< bcode> That must be a sign - I'll put it on! What could possibly go wrong...
< oracle\devnull> DIED : bcode2 (Wiz-Elf-Mal-Cha) 0 points, killed by strangulation on pcs1.nethack.devnull.net";"February 28, 2013";"nhqdb";"nhqdb"
The first two columns of each row are the same, being the quote (the first row has no embedded newlines in the quote; the second and third do). Separator is ';'.
> read.table('test.csv', sep=';', header=T)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 5 elements
# same for with ,allowEscape=T