3

This is a testing question, not an implementation question.

I have a program which produces a JSON, I want to be able to compare outputs consistently, so I'm converting all of my dicts to OrderedDicts.

I've searched through, and I'm pretty confident that I got them all, but good programming still requires testing. I'm not sure how to actually test that my dictionaries are coming out in the same order because I told them to, vs they just happened to come out in the same order...

Is there a way to force Python to randomize its non-Ordered dicts?

ETA: I'm using python 2.7. I've still got 18 months to convert this thing... it's on the list..

Brian Postow
  • 10,227
  • 14
  • 69
  • 113
  • 2
    Why does the order matter here? Can you just parse the JSON and compare it? – internet_user May 22 '18 at 20:32
  • Because there are some bits where the order matters in that the two sections need to be in the same order, for usage reasons. Also, because running text diff is much easier than writing a json comparison script... (None of the json comparison scripts out there seem to work right) – Brian Postow May 22 '18 at 20:38
  • If you use python 3.3 and up there’s already hash randomisation for each run of the interpreter. – jonrsharpe May 22 '18 at 20:39
  • 1
    Parse the JSON, recursively sort the resulting list/dict, then run any simple sequence diff on a depth-first walk of the trees. – abarnert May 22 '18 at 20:40
  • There are some things that can't be sorted alphabetically. because different parts need to be synchronized. – Brian Postow May 22 '18 at 20:41
  • And I am using 2.7 still... – Brian Postow May 22 '18 at 20:42
  • @jonrsharpe But that doesn't actually solve the OP's problem. There is no guarantee that the strings' hashes won't randomly happen to end up in the same order as insertion order. In fact, it makes it slightly worse—even if you test it and verify that hash order and insertion order are distinct, they may not be distinct in future runs of the same test. – abarnert May 22 '18 at 20:42
  • JSON objects can only have strings as keys. Which can always be alphabetically sorted. (And remember, we're not taking about sorting things for display purposes here, just sorting things to verify that the diff is empty.) – abarnert May 22 '18 at 20:43
  • Do you ever need dicts to be sorted? Or merely their string representations? – BallpointBen May 22 '18 at 21:53

1 Answers1

2

I'm not sure you really do need to test this, but if you do…

In CPython 2.7, there's really no point in testing this. The elements will be in arbitrary order—which means they could arbitrarily end up in the same order as insertion order, and there's no way to force them not to be.

In CPython 3.3-3.5, it's even worse. The elements will not only be in arbitrary order, but in a different arbitrary order each time you run the tests. Which means your test may look like it's working because it happens to have a 3!-1 / 3! chance of working, and then you'll check in a flaky and useless test.

In CPython 3.6-3.7, however, the order actually will be similar to OrderedDict, but not quite identical, and that means there is something you can reliably test for. If you delete from a dict and then insert, the old slot will get reused. This is, of course, not true for OrderedDict. So:

>>> d1, d2 = {}, collections.OrderedDict()
>>> for i in range(10):
...     d1[i] = d2[i] = i
>>> del d1[2]
>>> del d2[2]
>>> d1[2] = d2[2] = 1000
>>> d1
{0: 0, 1: 1, 2: 1000, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9
>>> d2
OrderedDict([(0, 0),
         (1, 1),
         (3, 3),
         (4, 4),
         (5, 5),
         (6, 6),
         (7, 7),
         (8, 8),
         (9, 9),
         (2, 1000)])

However, this is relying on implementation details of CPython 3.6-3.7, which is pretty much the thing you were trying to test that you weren't doing. Is it acceptable to do that in that test?

abarnert
  • 313,628
  • 35
  • 508
  • 596