Hash function that protects against collisions, not attacks. (Produces a random UUID-size result space)

Question

Using SHA1 to hash down larger size strings so that they can be used as a keys in a database.

Trying to produce a UUID-size string from the original string that is random enough and big enough to protect against collisions, but much smaller than the original string.

Not using this for anything security related.

Example:

# Take a very long string, hash it down to a smaller string behind the scenes and use
#     the hashed key as the data base primary key instead
def _get_database_key(very_long_key):
    return hashlib.sha1(very_long_key).digest()

Is SHA1 a good algorithm to be using for this purpose? Or is there something else that is more appropriate?

You could do `hashlib.sha1(os.urandom(32)).hexdigest()` or `os.urandom(16).encode('hex')`. Are you trying to avoid checking the table for duplicate IDs? — Blender, Mar 03 '13 at 07:15
What about a collision attack? Surely you still care about that. — Eric, Mar 03 '13 at 07:40
`sha256` or `sha512` would be less likely to cause collisions; do you have a size limit? Also check out [uuid v5](http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_5_.28SHA-1_hash.29) and [rfc 4122](http://tools.ietf.org/html/rfc4122#section-4.1.3) and the [uuid python library](http://docs.python.org/2/library/uuid.html). — Ja͢ck, Mar 03 '13 at 07:45

score 5 · Accepted Answer · answered Mar 03 '13 at 07:55

5

Python has a uuid library, based on RFC 4122.

The version that uses SHA1 is UUIDv5, so the code would be something like this:

import uuid

uuid.uuid5(uuid.NAMESPACE_OID, 'your string here')

answered Mar 03 '13 at 07:55

Ja͢ck

161,074
33
239
294

Hash function that protects against collisions, not attacks. (Produces a random UUID-size result space)

Example:

1 Answers1