5

I am using supercscv to write an utf-8 encoded csv. It produces a normal file but excel doesn't recognize it as utf-8 cause it's dumb, excel lost without the bom marker so any special characters are corrupted when opened with excel.

Is there a way to write a file as UTF-8 with BOM with supercsv ? I can't find it.

Thanks

allaf
  • 53
  • 1
  • 3

2 Answers2

10

As supercsv probably wraps a Writer:

Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8);
writer.write('\uFEFF'); // BOM for UTF-*
... new BeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
Joop Eggen
  • 96,344
  • 7
  • 73
  • 121
  • Thanks @JoopEggen, that was what I was lookin for. This is how it looks like : `OutputStreamWriter o = new OutputStreamWriter(out); // BOM o.write('\uFEFF'); writer = new CsvBeanWriter(o, CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);` – allaf Aug 19 '15 at 09:30
  • @allaf better add the UTF-8 to the new OutputStreamWriter call, as otherwise the default platform encoding is used - which is non-portable. – Joop Eggen Aug 19 '15 at 10:23
  • Awesome, saved my time! – prashanth-g Jun 11 '18 at 09:34
1

In my experience MS Excel always opens csv files in the default MS Office charset. In my case, it was always Windows 1252 (Spain), even in not Windows Machines (MS Office for OSX). The only way to deal with it was to write CSV files with this charset.

byte[] csvFileBytes = dataObject.toCSVString().getBytes(Charset.forName("Windows-1252"));

MS Excel seems to never use another charset to open CSV files. You can check this post: Is it possible to force Excel recognize UTF-8 CSV files automatically?

Community
  • 1
  • 1
Ricardo Vila
  • 1,529
  • 1
  • 16
  • 33
  • That is untrue; if the BOM is present in the file then Excel will open the file with the correct encoding. What it doesn't do UTF-8 by default is a mystery though. – fge Aug 18 '15 at 12:30
  • Are sure of this? MS Excel interprets file BOM? – Ricardo Vila Aug 18 '15 at 12:32
  • 1
    Yes I'm sure; try the answer above, ie write the BOM before writing anything else in the file – fge Aug 18 '15 at 12:34
  • Well i've tried to use CVS files with the BOM on MS Office 2011 for Mac and for Windows (spanish versions) and i couldn't get it work properly. Thats why i had to encode it in Windows 1252. – Ricardo Vila Aug 18 '15 at 12:41
  • Please fge, can you help me to know why this code is not working for writing BOM in a csv file? String fileName = "a.csv"; File file = FileUtils.getFile(fileName); FileWriter fw = new FileWriter(file); char[] cbuf = { 0xef, 0xbb, 0xbf };// BOM fw.write(cbuf); fw.write("aaáa;eé;cccÇÇÇ;\niií;oóó;uuúúü"); fw.flush(); fw.close(); – Ricardo Vila Aug 18 '15 at 12:54
  • Don't do it this way, do it as per the first answer; simply write char `\ufeff`. The writer will encode it for you (that's what a `Writer` is for; similarly a `Reader` will decode the bytes as `char`s) – fge Aug 18 '15 at 13:04
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87273/discussion-between-ricardo-vila-and-fge). – Ricardo Vila Aug 18 '15 at 13:05