The solution: Concatenate unicode strings before encoding, then encode the complete string just before writing to a file. The codecs
library simplifies this for you.
import codecs
file = codecs.open(os.path.join(fr_directory, 'strings.json'), 'w+', encoding='utf8')
file.write("\"" + key + "\": " + "\"" + french[key] + "\"" + ',' + '\n')
I have opened the file with codecs.open
rather than just open
, specifying that the file should automatically handle encoding into UTF-8 when you write unicode strings. I have also removed the explicit encoding call you used.
Further explanation:
The keys and values of your dictionary are almost certainly Unicode strings. A "Unicode string" needs to be encoded before it can be written to a file. Most operations in Python 2 assume an ASCII encoding unless told otherwise, and the file objects returned by open
are among them. That's why, if you try to write a Unicode string to a file, you'll see an exception:
>>> with open('/tmp/test.txt', 'w') as f:
... f.write(u"Vous avez termin\xe9 l'inscription !")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)
This error is one that you can fix by encoding the string directly, so this works:
>>> with open('/tmp/test.txt', 'w') as f:
... f.write(u"Vous avez termin\xe9 l'inscription !".encode('utf-8'))
However, this alone does not solve your problem, because you are trying to build a more complicated string. When you concatenate a Unicode string to a UTF-8 encoded "raw" string, you also get an exception, even when not writing to a file:
>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !".encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)
You can fix this by not encoding either string:
>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !"
u"YOU_HAVE_COMPLETED_ENROLLMENT: Vous avez termin\xe9 l'inscription !"
But then when you want to write it to a file, you would have to encode the whole thing again:
>>> with open('/tmp/test.txt', 'w') as f:
... line = u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !"
... f.write(line.encode('utf-8'))
But for convenience, the codecs
module gives you the tools to not always have to re-encode every time:
>>> import codecs
>>> with codecs.open('/tmp/test.txt', 'w', encoding='utf8') as f:
... f.write(u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !")