2

I'm encoding a string with Python's simplejson library with special characters:

hello testing

spécißl characters

plusses: +++++

special chars :œ∑´®†¥¨ˆøπ“ß∂ƒ©˙∆˚¬Ω≈ç√∫˜µ≤≥

However, when I encode it and transmit it to the other machine (using POST), it turns out like this:

{'message': ['{"body": "hello testing sp\\u00e9ci\\u00dfl characters\\n\\nplusses: \\n\\nspecial chars :\\u0153\\u2211\\u00b4\\u00ae\\u2020\\u00a5\\u00a8\\u02c6\\u00f8\\u03c0\\u201c\\u00df\\u2202\\u0192\\u00a9\\u02d9\\u2206\\u02da\\u00ac\\u03a9\\u2248\\u00e7\\u221a\\u222b\\u02dc\\u00b5\\u2264\\u2265"}']}

The + signs are completely stripped and the rest are in this unicode(?) format. My code for this is:

data = {'body': data_string}
data_encoded = json.dumps(data)

Any ideas? Thanks!

Edit: I've tried using json.dumps(data, ensure_ascii=False) but it results in a UnicodeError ordinal not in range error.

Mark
  • 51
  • 3
  • well, the +s are missing, and I'm trying to figure out how to convert the unicode character representations back to normal. and .encode('utf-8') doesn't seem to be working. – Mark Jan 08 '11 at 23:10
  • I'm thinking there's a fundamental thing that's going wrong, so that's why I'm asking :) – Mark Jan 08 '11 at 23:11
  • Just as a hint, there's a difference between `json` and `simplejson` in the manner they deal with unicode. Check out this other question for details: http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-unicode-ones-from-json-in-python/3972139#3972139 – ducu Jan 09 '11 at 15:12

2 Answers2

2
>>> import json

simple example, with plusses, Latin1 "s sharp", Cyrillic "capital zhe"

Note: ensure that your strings are unicode or ASCII:

>>> data = {"body" : u"++\xdf\u0416", "universe": 42}
>>> data
{'body': u'++\xdf\u0416', 'universe': 42}

Create your JSON string, which turns out to be ASCII -- all non-ASCII characters are escaped:

>>> encoded = json.dumps(data)
>>> encoded
'{"body": "++\\u00df\\u0416", "universe": 42}'

Transmit your JSON string to another computer. Make sure to do any further escaping necessary if your transmission channel mangles ASCII characters. On the remote computer, do any necessary unescaping to recover the JSON string.

Then convert the JSON string back to a Python object:

>>> decoded = json.loads(encoded)
>>> decoded
{u'body': u'++\xdf\u0416', u'universe': 42}
>>> decoded == data
True
>>>

A note on ensure_ascii=False: this will produce a unicode string:

>>> u_encoded = json.dumps(data, ensure_ascii=False)
>>> u_encoded
u'{"body": "++\xdf\u0416", "universe": 42}'

which must be encoded (UTF-8 is suggested) into a str string before you can transmit it, and decoded at the other end. You still need to take precautions against mangled + < > & etc characters.

John Machin
  • 75,436
  • 11
  • 125
  • 178
0

you doing it like that?

>>> s = u"""
... hello testing
... 
... spécißl characters
... 
... plusses: +++++
... 
... special chars :œ∑´®†¥¨ˆøπ“ß∂ƒ©˙∆˚¬Ω≈ç√∫˜µ≤≥
... """
>>> from json import dumps, loads
>>> loads(dumps(s))
u'\nhello testing\n\nsp\xe9ci\xdfl characters\n\nplusses: +++++\n\nspecial chars :\u0153\u2211\xb4\xae\u2020\xa5\xa8\u02c6\xf8\u03c0\u201c\xdf\u2202\u0192\xa9\u02d9\u2206\u02da\xac\u03a9\u2248\xe7\u221a\u222b\u02dc\xb5\u2264\u2265\n'
>>> print loads(dumps(s))

hello testing

spécißl characters

plusses: +++++

special chars :œ∑´®†¥¨ˆøπ“ß∂ƒ©˙∆˚¬Ω≈ç√∫˜µ≤≥

>>>
virhilo
  • 5,450
  • 1
  • 26
  • 26
  • not getting the same behavior here.. `logging.info(loads(dumps(data)))` => `u'hello testing sp\xe9ci\xdfl characters\n\nplusses: +++++\n\nspecial chars :\u0153\u2211\xb4\xae\u2020\xa5\xa8\u02c6\xf8\u03c0\u201c\xdf\u2202\u0192\xa9 \u02d9\u2206\u02da\xac\u03a9\u2248\xe7\u221a\u222b\u02dc\xb5\u2264\u2265'` – Mark Jan 08 '11 at 23:30
  • UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128) – Mark Jan 08 '11 at 23:32
  • 1
    you sure it was unicode before? >>> x = u'łąść'.encode('utf-8') >>> x '\xc5\x82\xc4\x85\xc5\x9b\xc4\x87' >>> print x.decode('utf-8') łąść >>> – virhilo Jan 08 '11 at 23:44