1

Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)

Execute same http request with Postman the json output is:

{ "value": "VILLE D\u0019ANAUNIA" }

My python code is:

data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)

Can I remove or replace all Unicode chars before executing conversion with json.loads(...)?

skaul05
  • 1,836
  • 2
  • 13
  • 23
orion91
  • 47
  • 1
  • 8
  • see if this one helps https://stackoverflow.com/questions/15321138/removing-unicode-u2026-like-characters-in-a-string-in-python2-7 or https://stackoverflow.com/questions/2234228/parsing-unicode-input-using-python-json-loads – Reddysekhar Gaduputi Feb 20 '19 at 10:49
  • Possible duplicate of https://stackoverflow.com/questions/27955978/python-requests-url-with-unicode-parameters – skaul05 Feb 20 '19 at 10:51
  • What is the raw response content (`print(requests.get(uri, headers=HEADERS).content)`)? It could be caused by an encoding problem... – Serge Ballesta Feb 20 '19 at 11:21

2 Answers2

2

It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 (). For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string.

So the correct way would be to control what exactly the API returns. If id does return a '\u0019' control character, you should contact the API owner because the problem should be there.

As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127))  # filter out unwanted chars
json_data = json.loads(data)

You should get {'value': 'VILLE DANAUNIA'}

Alternatively, you can replace all unwanted characters with spaces:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)

You would get {'value': 'VILLE D ANAUNIA'}

Serge Ballesta
  • 121,548
  • 10
  • 94
  • 199
  • Thank's for suggestion! For now i don't need replace unicode chars but maybe it could be useful in the future! – orion91 Feb 20 '19 at 14:00
1

The code below works on python 2.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)

The code below works on python 3.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)

Output:

{u'value': u'VILLE D\x19ANAUNIA'}

Another point is that requests get return the data as json:

r = requests.get('https://api.github.com/events')
r.json()
balderman
  • 12,419
  • 3
  • 21
  • 36