1

I have a word in Polish as a string variable which I need to print to a file:

# coding: utf-8

a = 'ilośc'
with open('test.txt', 'w') as f:
    print(a, file=f)

This throws

Traceback (most recent call last):
  File "C:/scratches/scratch_3.py", line 5, in <module>
    print(a, file=f)
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u015b' in position 3: character maps to <undefined>

Looking for existing answers (with .decode("utf-8"), or with .encode("utf-8")) and trying various incantations I finally managed the file to be created.

Unfortunately what was written was b'ilośc'and not ilośc. When I tried to decode that before printing to the file, I got back to the initial error and the same traceback.

How to write a str containing diacritics to a file so that it is a string and not a bytes representation?

Community
  • 1
  • 1
WoJ
  • 19,312
  • 30
  • 122
  • 230

2 Answers2

1
a = 'ilośc'
with open('test.txt', 'w') as f:
    f.write(a)

You can even write to the file using the binary mode:

a = 'ilośc'
with open('test.txt', 'wb') as f:
    f.write(a.encode())
Pierre Barre
  • 2,080
  • 1
  • 9
  • 21
1

The traceback says that you are trying to save 'ś' ('\u015b') character using cp1252 encoding (the default is locale.getpreferredencoding(False)—your Windows ANSI code page) that can't represent this Unicode character (there more than a million Unicode characters and cp1252 is a single-byte encoding that can represent only 256 characters).

Use a character encoding that can represent the desired characters:

with open(filename, 'w', encoding='utf-16') as file:
    print('ilośc', file=file)
jfs
  • 346,887
  • 152
  • 868
  • 1,518