39

I am using this code for add Persian words to a csv file via OpenCSV:

String[] entries="\u0645 \u062E\u062F\u0627".split("#");
try{
    CSVWriter writer=new CSVWriter(new OutputStreamWriter(new FileOutputStream("C:\\test.csv"), "UTF-8"));

    writer.writeNext(entries);
    writer.close();
}
catch(IOException ioe){
    ioe.printStackTrace();
}

When I open the resulting csv file, in Excel, it contains "ứỶờịỆ". Other programs such as notepad.exe don't have this problem, but all of my users are using MS Excel.

Replacing OpenCSV with SuperCSV does not solve this problem.

When I typed Persian characters into csv file manually, I don't have any problems.

mehdi
  • 686
  • 2
  • 9
  • 19
  • Definitely an exact duplicate. I just tried this problem with the solution from the link above (use a BOM to make Excel read in UTF-8) and it solved this problem. Alternatively, apparently using UTF-16 works as well to force Excel to read the CSV not in ASCII. – Jesse Webb Nov 18 '10 at 15:23
  • There is a useful link similar to AlexR's post ! http://weblogs.java.net/blog/joconner/archive/2010/03/24/writing-csv-files-utf-8-excel – Hamedz Oct 19 '12 at 21:12

3 Answers3

124

I spent some time but found solution for your problem.

First I opened notepad and wrote the following line: שלום, hello, привет Then I saved it as file he-en-ru.csv using UTF-8. Then I opened it with MS excel and everything worked well.

Now, I wrote a simple java program that prints this line to file as following:

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
    w.print(line);
    w.flush();
    w.close();

When I opened this file using excel I saw "gibrish."

Then I tried to read content of 2 files and (as expected) saw that file generated by notepad contains 3 bytes prefix:

    239 EF
    187 BB
    191 BF

So, I modified my code to print this prefix first and the text after that:

    String line = "שלום, hello, привет";
    OutputStream os = new FileOutputStream("c:/temp/j.csv");
    os.write(239);
    os.write(187);
    os.write(191);

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));

    w.print(line);
    w.flush();
    w.close();

And it worked! I opened the file using excel and saw text as I expected.

Bottom line: write these 3 bytes before writing the content. This prefix indicates that the content is in 'UTF-8 with BOM' (otherwise it is just 'UTF-8 without BOM').

István Békési
  • 950
  • 3
  • 16
  • 23
AlexR
  • 109,181
  • 14
  • 116
  • 194
8

Unfortunately, CSV is a very ad hoc format with no metadata and no real standard that would mandate a flexible encoding. As long as you use CSV, you can't reliably use any characters outside of ASCII.

Your alternatives:

  • Write to XML (which does have encoding metadata if you do it right) and have the users import the XML into Excel.
  • Use Apache POI to create actual Excel documents.
Michael Borgwardt
  • 327,225
  • 74
  • 458
  • 699
  • XML and POI are fine but CSV works too. Please see my comment. I managed to create CSV file that contains unicode symbols and can be opened with MS excel. – AlexR Nov 16 '10 at 10:39
  • 1
    @AlexR: That may or may not work for any given version of Excel or other programs, or it could cause the file to be rejected as invalid, or put some spurious characters into the first cell. Your program's behaviour should not rely on undocumented features. – Michael Borgwardt Nov 16 '10 at 13:24
  • @AlexR Can you help me.I have similar question https://stackoverflow.com/questions/66331230/unable-to-display-chinese-characters-in-excel-using-csv-bean-writer – Sachin HR Feb 24 '21 at 09:55
5

Excel doesn't use UTF8 to open CSV files. Thats a known problem. The actual encoding used depends on the locale settings of Microsoft Windows. With a German lcoale for example Excel would open a CSV file with CP1252.

You could create an Excel file containing some persian characters and save it as an CSV file. Then write a small Java program to read this file and test some common encodings. Thats the way I used to figure out the correct encoding for German umlauts in CSV files.

chkal
  • 5,532
  • 18
  • 26
  • 3
    Unfortunately it is wrong. I managed to create CSV file with unicode symbols that can be opened with excel. See my comment later – AlexR Nov 16 '10 at 10:38