6

I'm trying to generate the same MD5 codes from Python 3 and SQL but I get always a different result.

With this Unique Identifier f033b004-eb80-412d-9773-f4f06bb994c1

SELECT  HASHBYTES('MD5', 'f033b004-eb80-412d-9773-f4f06bb994c1')

Result: 0x9BCE8D23CAC76AF4F61C04673CDD0081

And then I do that calculation

SELECT  ABS(HASHBYTES('MD5', 'f033b004-eb80-412d-9773-f4f06bb994c1')  % 10)

Result: 5

Now with python I use the hashlib library

import hashlib

m = hashlib.md5('f033b004-eb80-412d-9773-f4f06bb994c1'.encode() )
m.digest(), m.hexdigest()

Result: b'\x9b\xce\x8d#\xca\xc7j\xf4\xf6\x1c\x04g<\xdd\x00\x81', '9bce8d23cac76af4f61c04673cdd0081'

And then I do the same calculations

    int.from_bytes( b'\x9b\xce\x8d#\xca\xc7j\xf4\xf6\x1c\x04g<\xdd\x00\x81',
                    byteorder='big', 
                    signed=False )%10

Result: 9

Does anyone knows how can I get from the Python code the same results as the SQL Server ?

jarlh
  • 35,821
  • 8
  • 33
  • 49
dapo
  • 485
  • 1
  • 5
  • 16
  • have you tried changing the byteorder? – Cato Jan 31 '19 at 14:25
  • Yes I did that, sorry for didn't write the answer but it is not like SQL Selver. It returns me 1. – dapo Jan 31 '19 at 14:36
  • On the Python side, you are converting `bytes` to an `int`, and taking the modulus of the `int`. On the SQL Server, you are taking the modulus of the output of `HASHBYTES` - a `VARBINARY`. Are you sure that the modulus of a `VARBINARY` is the same as that of the value cast to an integer? – snakecharmerb Feb 02 '19 at 12:24
  • No I am not sure. But do you know how can I check it or do it correctly in Python 3 ? – dapo Feb 04 '19 at 07:45

2 Answers2

0

This is because SQL server returns the MD5 hash in UPPERCASE and Python in lowercase. The UPPER and lower case bytes have, of course, different byte values.

Luc
  • 1,393
  • 1
  • 11
  • 19
  • I already test that, but I get the same result even if MD5 is lowercase or uppercase. If you get the same result as SQL Server with uppercase please post me the 3-4 lines of your Python code. – dapo Feb 01 '19 at 08:03
0

Just ran into this... the problem is that when SQL server converts the hash into integer to perform the modulo, it will not give you the correct integer because the hexadecimal is too big (check out the top answer in here: Conversion of long Hex string to Integer not working as expected (works in SQL))

On the other hand, python can do the conversion correctly because the max integer size is as much as memory allows