12

I have a very annoying encoding problem using opencsv. When I export a csv file, I set character type as 'UTF-8'.

CSVWriter writer = new CSVWriter(new OutputStreamWriter("D:/test.csv", "UTF-8"));

but when I open the csv file with Microsoft Office Excel 2007, it turns out that it has 'UTF-8 BOM' encoding?

Once I save the file in Notepad and re-open, the file turns back to UTF-8 and all the letters in it appears fine. I think I've searched enough, but I haven't found any solution to prevent my file from turning into 'UTF-8 BOM'. any ideas, please?

Petr Abdulin
  • 30,380
  • 8
  • 56
  • 90
user1213162
  • 155
  • 1
  • 1
  • 5
  • 1
    Java should not add a BOM on its own, since there also is no `OutputStreamWriter` constructor taking two strings I guess there is something missing from your code. Could the BOM be part of the data you write? – Jörn Horstmann Apr 13 '12 at 08:44

2 Answers2

22

I suppose your file has a 'UTF-8 without BOM' encoding. You better feed BOM encoding to your file, even though it's not necessary in most cases, but only one obvious exception is when you deal with ms excel.

FileOutputStream os = new FileOutputStream(file);
os.write(0xef);
os.write(0xbb);
os.write(0xbf);
CSVWriter csvWrite = new CSVWriter(new OutputStreamWriter(os));

Now your file will be understood by excel as utf-8 csv.

goodhyun
  • 4,210
  • 3
  • 27
  • 23
2

UTF-8 and UTF-8 Signature (which incorrectly named sometimes as UTF-8 BOM) are same encodings, and signature is used only to distinguish it from any other encodings. Any unicode application should process UTF-8 signature (which is three bytes sequence EF BB BF) correctly.

Why Java is specifically adds this signature and how to stop it doing that I don't know.

Petr Abdulin
  • 30,380
  • 8
  • 56
  • 90