2

When running this code the results change as expected since a set is unordered:

my_set_1 = {'a','b','c',}
print([i for i in my_set_1])

That is, multiple runs would give different lists, e.g.

['a', 'c', 'b']
['b', 'a', 'c']
['a', 'c', 'b']
['c', 'b', 'a']

etc.

(Note: You might be getting the same result instead, if you don't have PYTHONHASHSEED=random, as suggested in the comments. Also, if you are using the console to replicate it, make sure you Rerun the console every time you run the code.)


However, when placing the above code in a for loop the results are rather surprising:

for i in range(10):
    my_set_1 = {'a','b','c',}
    print([i for i in my_set_1])
# Prints: 
# ['a', 'c', 'b']
# ['a', 'c', 'b']
# ['a', 'c', 'b']
# ....

A single run of the for loop will print the same list. Rerunning the for loop can print a different list (e.g. ['c', 'b', 'a']) but it will still be printed 10 times without changing.

Why doesn't it change?

Reblochon Masque
  • 30,767
  • 8
  • 43
  • 68
user
  • 4,434
  • 7
  • 38
  • 66
  • My tests seem to show that set does keep order between runs (same items, unmodified), it just doesn't guarantee a specific order. Can not reproduce your first statement of set yielding different ordering across list comprehension runs. – Reut Sharabani Aug 23 '15 at 09:02
  • @ReutSharabani Are you running that code in the console? If so try this code inside a module and run the module. – user Aug 23 '15 at 09:03
  • The order in set is arbitrary. – yangjie Aug 23 '15 at 09:03
  • Can you provide **one** piece of code that shows the problem? – Reut Sharabani Aug 23 '15 at 09:09
  • @ReutSharabani What do you mean? Isn't the code above sufficient to reproduce the problem? Perhaps I didn't explain it correctly; if so, let me know so that i can rephrase it. Also, did you insert this code in a module and run it? Didn't it produce the described results? – user Aug 23 '15 at 09:13
  • 1
    They don't have `PYTHONHASHSEED=random`, that's why they can't reproduce. Probably you can note it in your answer. – Yaroslav Admin Aug 23 '15 at 09:20
  • I did as suggested and ran it in the REPL as well as in a module. This did not reproduce your problem. Post a single piece of code demonstrating the problem. Currently there are two separate pieces and both act like the second. – Reut Sharabani Aug 23 '15 at 09:21
  • Python guarantees that the order be consistent **within** each run, but not between. Other's may happen to see consistency between runs, but that only by coincidence (another version or another environment). – skyking Aug 23 '15 at 09:32
  • @ReutSharabani [On Python 3.3 and greater, hash randomization is turned on by default.](https://docs.python.org/3/using/cmdline.html#cmdoption-R) Could that be the reason you cant reproduce? If so, let me know so that I can edit the question accordingly. – user Aug 23 '15 at 09:39
  • I'm on python3.4 / windows. It may be random, but the probability of the same order across 10 runs is around (1/(setlength!))^10. – Reut Sharabani Aug 23 '15 at 09:49

2 Answers2

5

@ReblochonMasque has a correct point: set is based on the hash table and if hashes calculated between runs are the same, you'll have the same order between runs. However such behaviour is vulnerable to attacks.

To prevent these attacks special variable PYTHONHASHSEED was introduced. When it is set to random each run Python will generate different hashes for the same items. That's why you get different order.

To check this, you can run your program with PYTHONHASHSEED set to the same number. Order will be the same among runs.

$ export PYTHONHASHSEED=random
$ python t.py
['a', 'b', 'c']
$ python t.py
['a', 'c', 'b']
$ python t.py
['c', 'b', 'a']
$ export PYTHONHASHSEED=4
$ python t.py
['a', 'b', 'c']
$ python t.py
['a', 'b', 'c']
$ python t.py
['a', 'b', 'c']

If you look at the object.__hash__(). There is a note at the bottom (exactly about your case):

Note By default, the __hash__() values of str, bytes and datetime objects are "salted" with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

Community
  • 1
  • 1
Yaroslav Admin
  • 11,842
  • 6
  • 54
  • 69
4

You shall not expect that the order of a set will change; sets are unordered in a sense that the order is not an invariant i/e there is no guarantee that it will not change.

The implementation is in the form of a hash table (dictionary); as long as there are no collisions of keys, the order will likely not change, but there is no telling. It is also not possible to predict if or when that will occur.

Be careful with drawing conclusions from your experimentation: the results you get cannot be predicted and will depend on the state of your system at the moment you run. They also won't hold across platforms, versions of python, etc...

Reblochon Masque
  • 30,767
  • 8
  • 43
  • 68