Questions tagged [python-unicode]

Python distinguishes between byte strings and unicode strings. *Decoding* transforms bytestrings to unicode; *encoding* transform unicode strings to bytes.

Python distinguishes between byte strings and unicode strings. Decoding transforms bytestrings to unicode; encoding transform unicode strings to bytes.

Remember: you decode your input to unicode, work with unicode, then encode unicode objects for output as bytes.

See the

960 questions
12
votes
3 answers

shlex.split still not supporting unicode?

According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running the code below, I get: UnicodeEncodeError: 'ascii' codec can't encode characters in position 184-189: ordinal not in range(128) Am I doing something…
petr
  • 2,430
  • 3
  • 18
  • 27
10
votes
3 answers

python byte string encode and decode

I am trying to convert an incoming byte string that contains non-ascii characters into a valid utf-8 string such that I can dump is as json. b = '\x80' u8 = b.encode('utf-8') j = json.dumps(u8) I expected j to be '\xc2\x80' but instead I…
kung-foo
  • 216
  • 1
  • 3
  • 7
10
votes
4 answers

Treat an emoji as one character in a regex

Here's a small example: reg = ur"((?P[+\-])(?P.+?))$" (In both cases the file has -*- coding: utf-8 -*-) In Python 2: re.match(reg, u"hello").groupdict() # => {u'initial': u'\ud83d', u'rest': u'\udc4dhello'} # unicode why must you do…
naiveai
  • 235
  • 1
  • 12
  • 31
10
votes
2 answers

Tensorflow can not restore vocabulary in evaluation process

I am new to tensorflow and neural network. I started a project which is about detecting errors in persian texts. I used the code in this address and developed the code in here. please check the code because I can not put all the code here. What I…
10
votes
5 answers

How to write Russian characters in file?

In console when I'm trying output Russian characters It gives me ??????????????? Who know why? I tried write to file - in this case the same situation. for example f=open('tets.txt','w') f.write('some russian text') f.close inside file is -…
Pol
  • 20,480
  • 26
  • 66
  • 88
10
votes
1 answer

Display width of unicode strings in Python

How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format()? Motivating example: Printing a table of strings to the console. Some of the strings contain…
Christian Aichinger
  • 6,278
  • 2
  • 34
  • 56
10
votes
4 answers

Comparing string and unicode in Python 2.7.5

I wonder why when I make: a = [u'k',u'ę',u'ą'] and then type: 'k' in a I get True, while: 'ę' in a will give me False? It really gives me headache and it seems someone made this on purpose to make people mad...
Kulawy Krul
  • 223
  • 1
  • 2
  • 5
9
votes
5 answers

base64 encoding unicode strings in python 2.7

I have a unicode string retrieved from a webservice using the requests module, which contains the bytes of a binary document (PCL, as it happens). One of these bytes has the value 248, and attempting to base64 encode it leads to the following…
Marcin
  • 44,601
  • 17
  • 110
  • 191
9
votes
3 answers

UnicodeEncodeError when fetching url

I have this issue trying to get all the text nodes in an HTML document using lxml but I get an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8995: ordinal not in range(128). However, when I try to find out the type of…
Robert Smith
  • 8,127
  • 15
  • 68
  • 113
8
votes
1 answer

TypeError: write() argument 1 must be unicode, not str

I'm trying import a text file and save it on my desktop, but the text is in "utf-8" (there is this information in the book), so when I save without encoding the text has many strange characters, but when I try to save with explicit encoding this…
8
votes
1 answer

Pipreqs: UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1206: character maps to

When I use pipreqs, I have this problem. I use anaconda and Russian Windows. root@DESKTOP-ETLLRI1 C:\Users\root\Desktop\resumes $ pipreqs C:\Users\root\Desktop\resumes Traceback (most recent call last): File…
krax1337
  • 131
  • 1
  • 9
8
votes
3 answers

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error: Traceback (most recent call last): File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py",…
dtrinh
  • 195
  • 2
  • 4
  • 16
8
votes
1 answer

Python returns length of 2 for single Unicode character string

In Python 2.7: In [2]: utf8_str = '\xf0\x9f\x91\x8d' In [3]: print(utf8_str) In [4]: unicode_str = utf8_str.decode('utf-8') In [5]: print(unicode_str) In [6]: unicode_str Out[6]: u'\U0001f44d' In [7]: len(unicode_str) Out[7]: 2 Since unicode_str…
Tom
  • 669
  • 7
  • 14
8
votes
1 answer

UnicodeDecodeError when using Python 2.x unicodecsv

I'm trying to write out a csv file with Unicode characters, so I'm using the unicodecsv package. Unfortunately, I'm still getting UnicodeDecodeErrors: # -*- coding: utf-8 -*- import codecs import unicodecsv raw_contents = 'He observes an…
Scott
  • 974
  • 2
  • 12
  • 21
8
votes
1 answer

Python print unicode list

With the following code lst = [u'\u5de5', u'\u5de5'] msg = repr(lst).decode('unicode-escape') print msg I got [u'工', u'工'] How can I remove the leading u so that the content of msg is: ['工', '工']
gongzhitaao
  • 6,073
  • 3
  • 32
  • 44
1 2
3
63 64