9

This example works fine example:

import hashlib
m = hashlib.md5()
m.update(b"Nobody inspects")
r= m.digest()
print(r)

Now, I want to do the same thing but with a variable: var= "hash me this text, please". How could I do it following the same logic of the example ?

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997

3 Answers3

11

The hash.update() method requires bytes, always.

Encode unicode text to bytes first; what you encode to is a application decision, but if all you want to do is fingerprint text for then UTF-8 is a great choice:

m.update(var.encode('utf8')) 

The exception you get when you don't is quite clear however:

>>> import hashlib
>>> hashlib.md5().update('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing

If you are getting the hash of a file, open the file in binary mode instead:

from functools import partial

hash = hashlib.md5()
with open(filename, 'rb') as binfile:
    for chunk in iter(binfile, partial(binfile.read, 2048)):
        hash.update(chunk)
print hash.hexdigest()
Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
  • I followed the link you gave me and read about `digest()` method: can it receive a very long phrase ? I mean, if my variable above is the content of a text file (which is the case already), and my text file contains a lot of text, will `digest()` accept such a big file content ? –  Jul 23 '14 at 08:28
  • 2
    @begueradj: yes, it can take anything that fits in Python. If you are reading a text file, you can call `.digest()` multiple times, each time with a next chunk. Loop over the file to get lines, pass each line to `.digest()`, and when the file is done get the digest. – Martijn Pieters Jul 23 '14 at 08:30
  • 2
    @begueradj: or you can open the file in *binary* mode and you will not have to encode again. – Martijn Pieters Jul 23 '14 at 08:30
3

Try this. Hope it helps. The variable var has to be utf-8 encoded. If you type in a string i.e. "Donald Duck", the var variable will be b'Donald Duck'. You can then hash the string with hexdigest()

#!/usr/bin/python3
import hashlib
var = input('Input string: ').encode('utf-8')
hashed_var = hashlib.md5(var).hexdigest()
print(hashed_var)
MrOrange
  • 31
  • 4
0

I had the same issue as the OP. I couldn't get either of the previous answers to work for me for some reason, but a combination of both helped come to this solution.

I was originally hashing a string like this;

str = hashlib.sha256(b'hash this text')
text_hashed = str.hexdigest()
print(text_hashed)

Result;d3dba6081b7f171ec5fa4687182b269c0b46e77a78611ad268182d8a8c245b40

My solution to hash a variable;

text = 'hash this text'
str = hashlib.sha256(text.encode('utf-8'))
text_hashed = str.hexdigest()
print(text_hashed)

Result; d3dba6081b7f171ec5fa4687182b269c0b46e77a78611ad268182d8a8c245b40

James
  • 1