369

I have this error:

Traceback (most recent call last):
  File "python_md5_cracker.py", line 27, in <module>
  m.update(line)
TypeError: Unicode-objects must be encoded before hashing

when I try to execute this code in Python 3.2.2:

import hashlib, sys
m = hashlib.md5()
hash = ""
hash_file = input("What is the file name in which the hash resides?  ")
wordlist = input("What is your wordlist?  (Enter the file name)  ")
try:
  hashdocument = open(hash_file, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  hash = hashdocument.readline()
  hash = hash.replace("\n", "")

try:
  wordlistfile = open(wordlist, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  pass
for line in wordlistfile:
  # Flush the buffer (this caused a massive problem when placed 
  # at the beginning of the script, because the buffer kept getting
  # overwritten, thus comparing incorrect hashes)
  m = hashlib.md5()
  line = line.replace("\n", "")
  m.update(line)
  word_hash = m.hexdigest()
  if word_hash == hash:
    print("Collision! The word corresponding to the given hash is", line)
    input()
    sys.exit()

print("The hash given does not correspond to any supplied word in the wordlist.")
input()
sys.exit()
Augustin
  • 2,166
  • 17
  • 23
JohnnyFromBF
  • 8,300
  • 10
  • 43
  • 50

10 Answers10

367

It is probably looking for a character encoding from wordlistfile.

wordlistfile = open(wordlist,"r",encoding='utf-8')

Or, if you're working on a line-by-line basis:

line.encode('utf-8')

EDIT

Per the comment below and this answer.

My answer above assumes that the desired output is a str from the wordlist file. If you are comfortable in working in bytes, then you're better off using open(wordlist, "rb"). But it is important to remember that your hashfile should NOT use rb if you are comparing it to the output of hexdigest. hashlib.md5(value).hashdigest() outputs a str and that cannot be directly compared with a bytes object: 'abc' != b'abc'. (There's a lot more to this topic, but I don't have the time ATM).

It should also be noted that this line:

line.replace("\n", "")

Should probably be

line.strip()

That will work for both bytes and str's. But if you decide to simply convert to bytes, then you can change the line to:

line.replace(b"\n", b"")
cwallenpoole
  • 72,280
  • 22
  • 119
  • 159
  • 3
    `open(wordlist,"r",encoding='utf-8')` why use open with specific encoding, the encoding is specified the decode codec, without this option, it use platform-dependent encoding. – Tanky Woo Jan 19 '16 at 01:05
  • The first half of this is flat wrong, and it's shocking it got up-voted as high as it did. Specifying an `encoding` explicitly just changes how it decodes the bytes on disk to get a `str` (a text type storing arbitrary Unicode), but it would decode to `str` without that, and the problem is using `str` in the first place. The `line.encode('utf-8')` *undoes* that mistaken decoding, but the OP should just be opening the file in `'rb'` mode in the first place (with no encoding) so `line` is a `bytes` object in the first place (a few trivial changes needed to match, e.g. in `.replace("\n", '')`). – ShadowRanger Jan 13 '21 at 04:21
  • @ShadowRanger And if the OP *wants* a `str`? I added a bit to the answer, but my original reply was the short, sweet, and immediately available. It also happened to be the right answer for a project I was working on when I wrote the above reply, so `¯\_(ツ)_/¯` – cwallenpoole Jan 14 '21 at 15:20
152

You must have to define encoding format like utf-8, Try this easy way,

This example generates a random number using the SHA256 algorithm:

>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'
Community
  • 1
  • 1
Jay Patel
  • 23,885
  • 12
  • 63
  • 74
33
import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())
Sabyasachi
  • 1,144
  • 10
  • 16
  • hashlib.sha256 method always expected unicode. In Python-2 str was both str and unicode, thus just passing string_to_hash used to work just fine. However, in Python-3 string(text, here string_to_hash) and unicode are two different types. So when we pass just string_to_hash(which is of type text), it throws error stating a unicode value is required. – kundan Oct 29 '20 at 20:17
19

To store the password (PY3):

import hashlib, os
password_salt = os.urandom(32).hex()
password = '12345'

hash = hashlib.sha512()
hash.update(('%s%s' % (password_salt, password)).encode('utf-8'))
password_hash = hash.hexdigest()
  • 1
    This line makes the password impossible to use. password_salt = os.urandom(32).hex() It should a fixed known value but it can be secret for server only. Please correct me or adapt it to your code. – Yash Dec 12 '18 at 15:51
  • 1
    I agree with @Yash You either have a single salt you use for every hash (not the best), or if you generate a random salt for each hash, you must store it with the hash to use again later for comparison – Carson Evans Jan 09 '19 at 18:14
17

The error already says what you have to do. MD5 operates on bytes, so you have to encode Unicode string into bytes, e.g. with line.encode('utf-8').

Cat Plus Plus
  • 113,388
  • 26
  • 185
  • 215
13

Please take a look first at that answer.

Now, the error message is clear: you can only use bytes, not Python strings (what used to be unicode in Python < 3), so you have to encode the strings with your preferred encoding: utf-32, utf-16, utf-8 or even one of the restricted 8-bit encodings (what some might call codepages).

The bytes in your wordlist file are being automatically decoded to Unicode by Python 3 as you read from the file. I suggest you do:

m.update(line.encode(wordlistfile.encoding))

so that the encoded data pushed to the md5 algorithm are encoded exactly like the underlying file.

Community
  • 1
  • 1
tzot
  • 81,264
  • 25
  • 129
  • 197
  • Why decode only to reencode when you could just process the file in binary mode and deal with `bytes` the whole way? – ShadowRanger Jan 13 '21 at 04:29
  • @ShadowRanger for this simple case (just reading lines and stripping the b'\n' at the end of each line) your suggestion is correct and adequate. – tzot Jan 13 '21 at 15:32
12

encoding this line fixed it for me.

m.update(line.encode('utf-8'))
Mike Cash
  • 197
  • 3
  • 13
10

You could open the file in binary mode:

import hashlib

with open(hash_file) as file:
    control_hash = file.readline().rstrip("\n")

wordlistfile = open(wordlist, "rb")
# ...
for line in wordlistfile:
    if hashlib.md5(line.rstrip(b'\n\r')).hexdigest() == control_hash:
       # collision
NorthCat
  • 8,315
  • 16
  • 40
  • 45
jfs
  • 346,887
  • 152
  • 868
  • 1,518
  • 3
    I am absolutely amazed I had to scroll down this far to find the first sane answer. Unless there is some reason to think the `wordlist` file is in the wrong encoding (and must therefore be decoded from the wrong encoding, then encoded with the correct encoding for hashing) this is by far the best solution, avoiding pointless decoding and reencoding in favor of just processing `bytes` (the source of the error in the OP's code). – ShadowRanger Jan 13 '21 at 04:25
3

If it's a single line string. wrapt it with b or B. e.g:

variable = b"This is a variable"

or

variable2 = B"This is also a variable"
SBimochan
  • 217
  • 2
  • 12
-5

This program is the bug free and enhanced version of the above MD5 cracker that reads the file containing list of hashed passwords and checks it against hashed word from the English dictionary word list. Hope it is helpful.

I downloaded the English dictionary from the following link https://github.com/dwyl/english-words

# md5cracker.py
# English Dictionary https://github.com/dwyl/english-words 

import hashlib, sys

hash_file = 'exercise\hashed.txt'
wordlist = 'data_sets\english_dictionary\words.txt'

try:
    hashdocument = open(hash_file,'r')
except IOError:
    print('Invalid file.')
    sys.exit()
else:
    count = 0
    for hash in hashdocument:
        hash = hash.rstrip('\n')
        print(hash)
        i = 0
        with open(wordlist,'r') as wordlistfile:
            for word in wordlistfile:
                m = hashlib.md5()
                word = word.rstrip('\n')            
                m.update(word.encode('utf-8'))
                word_hash = m.hexdigest()
                if word_hash==hash:
                    print('The word, hash combination is ' + word + ',' + hash)
                    count += 1
                    break
                i += 1
        print('Itiration is ' + str(i))
    if count == 0:
        print('The hash given does not correspond to any supplied word in the wordlist.')
    else:
        print('Total passwords identified is: ' + str(count))
sys.exit()
udz
  • 11
  • 7