1

I have tried to load .csv in R. I get something like this

<f3>?<e9><U+00BC>?<e4><f3> . 

I have set my deafult text encoding to UTF-8 in global options. Is it possible for R to encode especially the apostrophe when exporting?

df = read.csv("text.csv", encoding="UTF-8",header=TRUE, stringsAsFactors=FALSE)

####Original CSV (Open in Notepad++)####
I don?ó?é¼?äót want
Jes?ÇÖs in the Family
others that wasn?ó?é¼?äót resolved and told
Am really happy with the this ?ƒÿü,
new ?ó?é¼?ôunbreakable?ó?é¼?¥ 
on the freeway?Ǫ.

####Load in R####
I don?<f3>?<e9><U+00BC>?<e4><f3>t want
Jes?<c7><d6>s in the Family
others that wasn?<f3>?<e9><U+00BC>?<e4><f3>t resolved and told
Am really happy with the this ?<U+0083><ff><fc>
new ?<f3>?<e9><U+00BC>?<f4>unbreakable?<f3>?<e9><U+00BC>?<U+00A5> 
on the freeway?<U+01EA>.

####What I want####
Because I don't want
Jes's in the Family
others that wasn't resolved and told
Am really happy with the this 
new 'unbreakable'
on the freeway….

Thanks.

Anna
  • 1,728
  • 1
  • 20
  • 31
CHONG
  • 333
  • 1
  • 5
  • 12
  • what is the encoding of the csv file? – JdeMello May 30 '18 at 01:33
  • Where did you come up with the output in "What I want" section – MichaelChirico May 30 '18 at 01:56
  • 1
    Possibly duplicate of https://stackoverflow.com/questions/4806823/how-to-detect-the-right-encoding-for-read-csv Please note the recommended `guess_encoding` argument in `readr` package. Could help to solve your problem. The bottom-line is that you need to find out the original encoding of your file. – Radim May 30 '18 at 01:59
  • @JdM - I opened the file in Excel and saved it as csv (UTF-8) – CHONG May 31 '18 at 09:25
  • @MichaelChirico the output that I want (will export the data after replacing eg. ?ó?é¼?äó with apostrophe ) – CHONG May 31 '18 at 09:25
  • Does the apostrophe display fine when you open the file in excel? – JdeMello May 31 '18 at 12:28
  • A workaround save the file with `windows-1252` encoding (it is probably referred to `ANSI` then import the file with the same encoding. That should work. – JdeMello May 31 '18 at 12:41

2 Answers2

0

You can do this:

Here x is your given data in a one string like below:

x <- "I don?ó?é¼?äót want Jes?ÇÖs in the Family others that wasn?ó?é¼?äót resolved and told Am really happy with the this ?ƒÿü, new ?ó?é¼?ôunbreakable? ?é¼?¥ on the freeway?Ǫ."

You can couple gsub with iconv to get the almost desired result. I am not sure how to get the smily though in your output:

 gsub("\\?+","'",iconv(x, "latin1", "ASCII", sub=""))

Output:

[1] "I don't want
     Jes's in the Family
     others that wasn't resolved and told
     Am really happy with the this ',
     new 'unbreakable'on the freeway'."
PKumar
  • 10,106
  • 5
  • 32
  • 47
0

You should try to convert from utf-8 to ascii:

dt <- iconv(dt, 'utf-8', 'ascii', sub='')

iconv is under ‘tm’ library

csabinho
  • 1,451
  • 1
  • 13
  • 22