5

Using SHA1 to hash down larger size strings so that they can be used as a keys in a database.

Trying to produce a UUID-size string from the original string that is random enough and big enough to protect against collisions, but much smaller than the original string.

Not using this for anything security related.

Example:

# Take a very long string, hash it down to a smaller string behind the scenes and use
#     the hashed key as the data base primary key instead
def _get_database_key(very_long_key):
    return hashlib.sha1(very_long_key).digest()

Is SHA1 a good algorithm to be using for this purpose? Or is there something else that is more appropriate?

Community
  • 1
  • 1
Chris Dutrow
  • 42,732
  • 59
  • 174
  • 243
  • You could do `hashlib.sha1(os.urandom(32)).hexdigest()` or `os.urandom(16).encode('hex')`. Are you trying to avoid checking the table for duplicate IDs? – Blender Mar 03 '13 at 07:15
  • What about a collision attack? Surely you still care about that. – Eric Mar 03 '13 at 07:40
  • 1
    `sha256` or `sha512` would be less likely to cause collisions; do you have a size limit? Also check out [uuid v5](http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_5_.28SHA-1_hash.29) and [rfc 4122](http://tools.ietf.org/html/rfc4122#section-4.1.3) and the [uuid python library](http://docs.python.org/2/library/uuid.html). – Ja͢ck Mar 03 '13 at 07:45

1 Answers1

5

Python has a uuid library, based on RFC 4122.

The version that uses SHA1 is UUIDv5, so the code would be something like this:

import uuid

uuid.uuid5(uuid.NAMESPACE_OID, 'your string here')
Ja͢ck
  • 161,074
  • 33
  • 239
  • 294