The file I'm reading contains one word per line. I have issues with some of these words, as it seems some characters are unusual. see the following example with the first word of my list
stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8")$V1
stopwords[1] # "a" , if you copy paste into R studio this character with the quotes around it, you'll see a little red dot preceding the a.
stopwords[1] == "a" # FALSE
How did it happen ? How can I avoid it ? And if I haven't avoided it, how do I convert this dotted "a" into a regular "a" ?
EDIT:
you can reproduce the issue by just copy pasting this in Rstudio:
"a" == "a" # FALSE
here's where I get the file from: https://sites.google.com/site/kevinbouge/stopwords-lists/stopwords_fr.txt?attredirects=0&d=1
The encoding of the file, according to notepad++, is UTF-8-BOM. But using "UTF-8-BOM" as the encoding doesn't help. though it seemed to work in this answer: Read a UTF-8 text file with BOM
stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8-BOM")$V1
stopwords[1] # "a"
I have R version 3.0.2