UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to

Question

I am new to Python and am hoping that someone could please explain to me what the error message means.

To be specific, I have some code of Python and SPSS combined together saved in Atom, which was created by a former colleague. Now since the former colleague is not here anymore, I need to run the code now. What I did was I ran the code below from SPSS22.

    begin program.
    import spss,spssaux,imp
    abcvalid = imp.load_source('abcvalid', "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py") 
    import abcvalid
    abcvalid.fullprocess("9_26_2016","M:/Users/Yli\2016 SURVEY/DOWNLOADS/9_26_2016/","M:/Users/Yli/2016 SURVEY/Legacy15.sav")
    end program.

Then I got the following from the output.

    Traceback (most recent call last):
      File "<string>", line 5, in <module>
      File "I:/VALIDITY CHECK/Python Library/2016/abcnvalid2016.py", line 2067, in fullprocess
        dataprep(date,filepath,legacypath)
      File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 2006, in dataprep
        emailslower(date,filepath)
      File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 1635, in emailslower
        DATASET ACTIVATE comment_data.""".format(date,filepath))
      File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spss.py", line 1494, in Submit
        cmdList = spssutil.CheckStr(cmdList)
      File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spssutil.py", line 166, in CheckStr
        s1 = unicode(mystr,locale.getlocale(locale.LC_CTYPE)[1])
      File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\encodings\cp1252.py", line 15, in decode
        return codecs.charmap_decode(input,errors,decoding_table)
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to <undefined>

I know there are similar questions on this site, but the questions and answers were too hard for me to comprehend. If someone could please help me, I'd really appreciate it!

Thank you in advance!

bers · Answer 1 · 2019-03-15T11:20:03.173

First, here is a minimal example reproducing your error on Windows:

import subprocess

with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True) as Process:
    for Line in Process.stdout:
        print(Line)

To my understanding, the problem is this (I put together some information and examples which I have found, but am not certain everything is correct. I welcome corrections.)

The ü character is code point 252 = 0xfc in Unicode, https://unicode-table.com/en/00FC/).
Python correct passes the ü character to the console, as you can test using this example (be sure to save the file as UTF-8):

import subprocess

print(ord('ü'))
subprocess.call("cmd /c echo ü")

I am not sure why this is working in the first place. (This answer may be why: https://stackoverflow.com/a/32176732/880783)

The console uses something else than Unicode internally. For example, in the ASCII table, the ü character is at position 129 = 0x81 (sounds familiar?).
So when the console returns that character, Python thinks its a Unicode codepoint, but 0x81 is not defined. Hence the error.

The key is to make Python understand that how what it gets from the process is encoded. In my example (Windows console), I have tried a couple of encodings (see the list here) like this:

import subprocess

Encoding = 'cp850'
with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True, encoding=Encoding) as Process:
    for Line in Process.stdout:
        print(Line)

'ascii' fails with an ordinal not in range(128) error (probably does not cover extended ASCII).
'cp1252' fails with character maps to <undefined>
'latin_1' works, but outputs a box character (``) on my debug console in VS Code.
'cp850' seem to works, outputting a ü character.

So I will stick with 'cp850' for now and see how it goes.

score 2 · Accepted Answer · answered Sep 29 '16 at 13:18

It's hard to be sure about what is going on here as there is a lot of code off stage, but the error message is telling you that there is an invalid character in the input stream. Code x81 is undefined in code page 1252, which is the code page in effect. That's the western Europe/US default code page. The program is trying to convert a presumed code-page string to Unicode, so that fails.

My guess is that the input is actually not encoded with cp 1252. Something is messed up in in the Statistics current code page or with Unicode mode. You might need to set the SPSS Statistics locale to something different or to turn Unicode mode on or off. See SET LOCALE and SET UNICODE in the Command Syntax Reference on how to do this.

If you can say more about your locale and what this code is doing, we might be able to provide more information.

Thank you so much for the detailed explanation! I'll see if I can fix it. Thank you!!! — user6655908, Sep 29 '16 at 13:24

score 2 · Answer 3 · answered Oct 29 '20 at 15:06

2

On a similar problem with same error message i did something like this and it worked good for me.

with open(workfile, 'r', encoding='utf-8') as f:
    read_data = f.read()
f.close()

answered Oct 29 '20 at 15:06

andsa

71
2

score 1 · Answer 4 · answered Dec 15 '20 at 09:46

if you importing file in python and getting this error provide file encoding type for example

before

import numpy as np
import csv

with open("terrorismData.csv", "r") as file_obj:
file_data = csv.DictReader(file_obj, skipinitialspace = True)
file_list = list(file_data)

after

with open("terrorismData.csv", "r",encoding="ISO-8859-1") as file_obj:
file_data = csv.DictReader(file_obj, skipinitialspace = True)
file_list = list(file_data)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to

4 Answers4