3

As an experiment, the hashes in python2 and python3 seem to be different:

alvas@ubi:~$ python -c "from collections import Counter; x = Counter({'foo': 1, 'bar': 1, 'foobar': 1, 'barfoo': 1}); print(x.most_common())"
[('foobar', 1), ('foo', 1), ('bar', 1), ('barfoo', 1)]
alvas@ubi:~$ python -c "from collections import Counter; x = Counter({'foo': 1, 'bar': 1, 'foobar': 1, 'barfoo': 1}); print(x.most_common())"
[('foobar', 1), ('foo', 1), ('bar', 1), ('barfoo', 1)]
alvas@ubi:~$ python -c "from collections import Counter; x = Counter({'foo': 1, 'bar': 1, 'foobar': 1, 'barfoo': 1}); print(x.most_common())"
[('foobar', 1), ('foo', 1), ('bar', 1), ('barfoo', 1)]


alvas@ubi:~$ python3 -c "from collections import Counter; x = Counter({'foo': 1, 'bar': 1, 'foobar': 1, 'barfoo': 1}); print(x.most_common())"
[('barfoo', 1), ('foobar', 1), ('bar', 1), ('foo', 1)]
alvas@ubi:~$ python3 -c "from collections import Counter; x = Counter({'foo': 1, 'bar': 1, 'foobar': 1, 'barfoo': 1}); print(x.most_common())"
[('foo', 1), ('barfoo', 1), ('bar', 1), ('foobar', 1)]
alvas@ubi:~$ python3 -c "from collections import Counter; x = Counter({'foo': 1, 'bar': 1, 'foobar': 1, 'barfoo': 1}); print(x.most_common())"
[('bar', 1), ('barfoo', 1), ('foobar', 1), ('foo', 1)]

And when we look at string hashes, python3 hashes seems to be dynamic:

alvas@ubi:~$ python -c "print 'abc'.__hash__()"
1453079729188098211
alvas@ubi:~$ python -c "print 'abc'.__hash__()"
1453079729188098211
alvas@ubi:~$ python -c "print 'abc'.__hash__()"
1453079729188098211


alvas@ubi:~$ python3 -c "print ('abc'.__hash__())"
-4165906745021293940
alvas@ubi:~$ python3 -c "print ('abc'.__hash__())"
-4676677077013862663
alvas@ubi:~$ python3 -c "print ('abc'.__hash__())"
5261896652811750722

My question is why and how is the hash different?

Which hashing algorithm is each one of them using? Where are I find the exact CPython code where the string hashing happens?

Is there a way to unrandomize the hashes?


EDITED

After reading the PEP398 , this can unset the random hash but it's not recommended due to security issues.

alvas@ubi:~$ export PYTHONHASHSEED=0
alvas@ubi:~$ python3 -c "print ('abc'.__hash__())"
4596069200710135518
alvas@ubi:~$ python3 -c "print ('abc'.__hash__())"
4596069200710135518
alvas@ubi:~$ python3 -c "print ('abc'.__hash__())"
4596069200710135518
alvas
  • 94,813
  • 90
  • 365
  • 641
  • 2
    Hash randomisation - related question: http://stackoverflow.com/questions/14956313/dictionary-ordering-non-deterministic-in-python3 – Alex Riley Nov 19 '15 at 17:03
  • Related: http://stackoverflow.com/questions/33558709/the-similar-method-from-the-nltk-module-produces-different-results-on-different – alvas Nov 19 '15 at 17:04
  • Is there a way to unrandomize hashes? `python3 -c "import random; random.seed(0); print ('abc'.__hash__())"` don't work =( – alvas Nov 19 '15 at 17:08
  • @alvas: read the PEP, there is an [environment variable](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED). The source is all in https://hg.python.org/cpython/file/3.5/Python/pyhash.c – Martijn Pieters Nov 19 '15 at 17:11
  • Thanks @MartijnPieters!!! That explains a lot =) – alvas Nov 19 '15 at 17:12
  • 2
    If you need consistent results, you're probably best off not using the default `__hash__` methods at all. They're not going to be consistent across 32-bit and 64-bit builds of CPython, or across different Python implementations, and they could easily change across CPython releases. – user2357112 supports Monica Nov 19 '15 at 17:20

0 Answers0