0

I'm writing small app to compare two XML files and then output differences to CSV. All looks fine when I open CSV with notepad++ (shows encoding UTF-8 but I get weird characters when opening with MS Excel. Any ideas how to make it work in MS Excel too?

I'm using xml.etree.ElementTree to parse XML files. Differences are stored in two lists. I've tried all kind of suggestions from other similar questions but nothing seems to work.

File saving code is here:

def saveresults():
filesave = tkFileDialog.asksaveasfilename(initialdir=filenewdir, title="Save to CSV",
                                      defaultextension=".csv", filetypes=[("CSV (Comma delimited)", ".csv")])
if filesave is not None:
    with open(filesave, 'wb') as fp:
        writef = unicodecsv.writer(fp, dialect="excel-tab", encoding="utf-8")
        writef.writerow(("Trigger", "Variable", "Old label", "Old columns", "Old precode", "Old precode text",
                        "New label", "New columns", "New precode", "New precode text"))

        for x, newvalue in enumerate(newvalues):
            writef.writerow((newvalue[0], newvalue[1], oldvalues[x][2], oldvalues[x][3], oldvalues[x][4],
                             oldvalues[x][5], newvalue[2], newvalue[3], newvalue[4], newvalue[5]))
        fp.close()
        tkMessageBox.showinfo(title="Message", message="Done!")

Tested Text in notepad:

Spend £2,000 or more on building, dcorating or home improvements Lietuvos valstiečių ir žaliųjų sąjungos (LVŽS)

Same text when opened with MS Excel:

Spend £2,000 or more on building, dcorating or home improvements Lietuvos valstieÄių ir žaliųjų sÄ…jungos (LVŽS)

Mudyla
  • 187
  • 1
  • 12

1 Answers1

0

It appears that the MS Excel version is reading the file as if it was encoded in ANSI, so I expect this is an excel problem in the end - you mention trying ideas from other questions, does that include Is it possible to force Excel recognize UTF-8 CSV files automatically? ? In particular does it work if you open excel then import the data, rather than opening the data with excel from explorer/command line?

Community
  • 1
  • 1
  • It works in Excel if I import CSV using wizard or convert encoding to UTF-8 with BOM in notepad before opening it. Would be the right way to add BOM? I tried `writef.writerow(codecs.BOM_UTF8)` but it didn't work. – Mudyla Jan 18 '17 at 13:48
  • You can try explicitly including the BOM: `writerow('\uefbbbf')` but that's formally not recommended - probably hence why it's not done automatically – Fred Cascarini Jan 18 '17 at 14:10