29

After multiple searches I have not been able to determine how to avoid an error stating: "Unicode-objects must be encoded before hashing" when using this code:

    pwdinput = input("Now enter a password:")
    pwd = hashlib.sha1()
    pwd.update(pwdinput)
    pwd = pwd.hexdigest()

How can I get past that error? How do you encode Unicode-objects?

Nate
  • 293
  • 1
  • 3
  • 4

1 Answers1

51
pwdinput = input("Now enter a password:").encode('utf-8') # or whatever encoding you wish to use

Assuming you're using Python 3, this will convert the Unicode string returned by input() into a bytes object encoded in UTF-8, or whatever encoding you wish to use. Previous versions of Python do have it as well, but their handling of Unicode vs. non-Unicode strings was a bit messy, whereas Python 3 has an explicit distinction between Unicode strings (str) and immutable sequences of bytes that may or may not represent ASCII characters (bytes).

http://docs.python.org/library/stdtypes.html#str.encode
http://docs.python.org/py3k/library/stdtypes.html#str.encode

JAB
  • 19,150
  • 4
  • 64
  • 78
  • 8
    While I'm no big fan of Python 2.x's unicode handling, this particular code should work perfectly well in Python 2.7 as well, because both the `str` and `unicode` types have the encode method, and, provided a string consists only of ASCII characters, the utf-8 encoding of the string is exactly equal to the byte string of those characters. That fact is important if you want the hash for "abc" and u"abc" to come out the same. If you are okay with those two being treated differently, then any encoding is fine. – GrandOpener Nov 01 '12 at 09:51