168

I'll be receiving a JSON encoded string form Obj-C, and I am decoding a dummy string (for now) like the code below. My output comes out with character 'u' prefixing each item:

[{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}...

How is JSON adding this unicode char? What's the best way to remove it?

mail_accounts = []
da = {}
try:
    s = '[{"i":"imap.gmail.com","p":"aaaa"},{"i":"imap.aol.com","p":"bbbb"},{"i":"333imap.com","p":"ccccc"},{"i":"444ap.gmail.com","p":"ddddd"},{"i":"555imap.gmail.com","p":"eee"}]'
    jdata = json.loads(s)
    for d in jdata:
        for key, value in d.iteritems():
            if key not in da:
                da[key] = value
            else:
                da = {}
                da[key] = value
        mail_accounts.append(da)
except Exception, err:
    sys.stderr.write('Exception Error: %s' % str(err))

print mail_accounts
moffeltje
  • 4,095
  • 4
  • 24
  • 49
janeh
  • 3,528
  • 7
  • 21
  • 42
  • 7
    Python does have a problem here. Everything is not chill. I'm getting errors in the strings that Python creates when I try and write these strings to a file. For example when python takes "53" from JSON it turns it into u'53' and attempts to write it to a file as hex character u'\xe1' which causes Python to take a perfectly good string and puke on it: JSON: {"sa_BstDeAv": "53", "sa_BwVUpMx"... PYTHON: {u'sa_BstDeAv': u'53', u'sa_BwVUpMx'... ERROR ON WRITE: Value error('ascii' codec can't encode character u'\xe1' in position 5: ordinal not in range(128)) – David Urry Sep 22 '15 at 16:54
  • @janehouse the right answer here is the answer by jdi I really think you should change it. – Dekel Aug 17 '17 at 22:21

9 Answers9

174

The u- prefix just means that you have a Unicode string. When you really use the string, it won't appear in your data. Don't be thrown by the printed output.

For example, try this:

print mail_accounts[0]["i"]

You won't see a u.

Ned Batchelder
  • 323,515
  • 67
  • 518
  • 625
  • 6
    Your answer was the most useful one I got, and I think the asker of this question would have really appreciated it: http://stackoverflow.com/questions/956867/how-to-get-string-objects-instead-of-unicode-ones-from-json-in-python – jimh Mar 19 '16 at 01:17
  • 1
    Thank you so much ! i was confused for u'' letter for so long – ketan khandagale Oct 25 '16 at 13:26
  • 1
    Except if you copy and paste it you have a vast amount of `u`s in your data. Frankly, printing out a `u` to indicate it's a Unicode string is one of the worst mistakes about Python. Utterly ridiculous. Why not print an `a` before every string if it's ASCII? An `i` if it's an integer? – Snowcrash Aug 05 '18 at 11:17
  • In Python 2, Unicode strings are a different type than byte strings, so the repr of the data includes the prefix to indicate that. It's not about what the contents happen to be, it's about the type. The u prefix is fine if you are pasting the contents back into a Python program. If not, perhaps you want to use json.dumps() instead. – Ned Batchelder Aug 05 '18 at 15:18
  • You have to use the string to search the dictionary of json. you may not however use the dot operator. – Maddocks Mar 04 '20 at 03:00
  • The 'u' is not a good thing for the end-user, to whom it will be confusing and I do not want to show it. – Ed Randall May 24 '20 at 09:56
158

Everything is cool, man. The 'u' is a good thing, it indicates that the string is of type Unicode in python 2.x.

http://docs.python.org/2/howto/unicode.html#the-unicode-type

Aman
  • 38,643
  • 7
  • 32
  • 37
57

The d3 print below is the one you are looking for (which is the combination of dumps and loads) :)

Having:

import json

d = """{"Aa": 1, "BB": "blabla", "cc": "False"}"""

d1 = json.loads(d)              # Produces a dictionary out of the given string
d2 = json.dumps(d)              # Produces a string out of a given dict or string
d3 = json.dumps(json.loads(d))  # 'dumps' gets the dict from 'loads' this time

print "d1:  " + str(d1)
print "d2:  " + d2
print "d3:  " + d3

Prints:

d1:  {u'Aa': 1, u'cc': u'False', u'BB': u'blabla'}
d2:  "{\"Aa\": 1, \"BB\": \"blabla\", \"cc\": \"False\"}"
d3:  {"Aa": 1, "cc": "False", "BB": "blabla"}
Mercury
  • 4,795
  • 1
  • 28
  • 40
  • 3
    Huh? `json.dumps` converts the dict back to a (JSON-encoded) string. That's not what the OP wanted to do. -1. – Mark Amery Jan 16 '16 at 13:36
  • 10
    But if you use it together with json.loads it outputs the dictionary without the encoded characters wihch is an answer for the question (this is d3 print above) read the answer well! – Mercury Jan 16 '16 at 13:53
10

Unicode is an appropriate type here. The JSONDecoder docs describe the conversion table and state that json string objects are decoded into Unicode objects

https://docs.python.org/2/library/json.html#encoders-and-decoders

JSON                    Python
==================================
object                  dict
array                   list
string                  unicode
number (int)            int, long
number (real)           float
true                    True
false                   False
null                    None

"encoding determines the encoding used to interpret any str objects decoded by this instance (UTF-8 by default)."

Mark Amery
  • 110,735
  • 57
  • 354
  • 402
jdi
  • 83,050
  • 18
  • 151
  • 188
9

Those 'u' characters being appended to an object signifies that the object is encoded in "unicode".

If you want to remove those 'u' chars from your object you can do this:

import json, ast
jdata = ast.literal_eval(json.dumps(jdata)) # Removing uni-code chars

Let's checkout from python shell

>>> import json, ast
>>> jdata = [{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}]
>>> jdata = ast.literal_eval(json.dumps(jdata))
>>> jdata
[{'i': 'imap.gmail.com', 'p': 'aaaa'}, {'i': '333imap.com', 'p': 'bbbb'}]
Nivesh Krishna
  • 123
  • 1
  • 5
  • I suggest every newbie simply try out this script and voila you have yourself a script to convert ~from~ u'JSON output :) ... if one can only add stdin to the script , and json format at the end, you're ready to go! – Jordan Gee Mar 14 '20 at 01:26
8

The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.

Abe Karplus
  • 7,360
  • 2
  • 23
  • 24
4

I kept running into this problem when trying to capture JSON data in the log with the Python logging library, for debugging and troubleshooting purposes. Getting the u character is a real nuisance when you want to copy the text and paste it into your code somewhere.

As everyone will tell you, this is because it is a Unicode representation, and it could come from the fact that you’ve used json.loads() to load in the data from a string in the first place.

If you want the JSON representation in the log, without the u prefix, the trick is to use json.dumps() before logging it out. For example:

import json
import logging

# Prepare the data
json_data = json.loads('{"key": "value"}')

# Log normally and get the Unicode indicator
logging.warning('data: {}'.format(json_data))
>>> WARNING:root:data: {u'key': u'value'}

# Dump to a string before logging and get clean output!
logging.warning('data: {}'.format(json.dumps(json_data)))
>>> WARNING:root:data: {'key': 'value'}
jonatan
  • 6,863
  • 2
  • 21
  • 33
  • 1
    This really should be the best answer, the 'u's absolutely do not "just get stripped out" in many contexts. Thank you so much for this! – Jessica Pennell Jun 15 '19 at 00:08
2

Try this:

mail_accounts[0].encode("ascii")

  • An answer without any explanation is nearly useless. Please try to add some information like why this would help. – Abhilash Chandran Jan 22 '20 at 09:56
  • Personally, I find lengthy answers with too much unnecessary information distracting. The above answers already explain that the value is unicode and needs to be converted to ascii so I'm not repeating all that. Just showing a simpler way to get the value. If anyone has problems using this answer just ask and I am happy to explain further! Thanks – 2nd Sight Lab Jan 22 '20 at 21:37
  • 1
    This is actually the only answer which shows concisely how to re-code each string to 'normal' without going through a (what must be ridiculously inefficient) json.loads, json.dumps cycle. – Ed Randall May 24 '20 at 09:31
  • What you are missing is that this will fail miserably with a UnicodeEncodeError if the message contains any non-ASCII characters. You can prevent that with `errors="ignore"` but more often than not, the real problem is that you don't know what you are doing, and discarding the errors is just hiding this fact under the rug. Too often, "I want ASCII" is just crypto-speak for "I don't understand languages other than English; I wish they would just go away; but in the meantime I will write software which (often, needlessly) works badly for other languages." – tripleee May 12 '21 at 07:18
-1

Just replace the u' with a single quote...

print (str.replace(mail_accounts,"u'","'"))
Mikematic
  • 146
  • 1
  • 4