2

Yesterday at work I stumbled upon a piece of code that was roughly like this:

uasort($array, function($a, $b) {
    return isset($a['sort_code']) && isset($b['sort_code']) && $a['sort_code'] > $b['sort_code'];
});

this flipped a couple of switches in my head seeing as this can only return a boolean true or false instead of the 0 1 or -1 that is clearly stated in php.net. Best case scenario PHP will interpret false as 0 and true as 1. So one (the colleague that wrote this) could argue that it actually works.

I made the argument that it cannot possibly work because it will never ever ever return a -1. There is a state missing in this return-either-true-or-false from the sort callback and the way I think this is you will get either a wrong result or the process is not as performant.

I actually got down to writing a couple of tests and though I am no expert in sorting I actually got to the point where you could actually do away with the -1 state and still get a correct sorting with the same amount of calculations

effectively you could replace:

if ($a === $b) {
    return 0;
}

return $a < $b ? -1 : 1; 

with

return $a > $b;

Now, as I said, I am by no means an expert in sorting and my tests were less than thorough. However my results with small and bigger arrays showed that this is true. The array always got sorted correctly with the amount of calculations being the same using either method

So it really doesnt seem to be that big of a matter that there is no third state (-1)

I still stand by my opinion that you should follow the docs, especially PHP, to the letter. When php says I need either 0, a negative or positive number then that is exactly what one should do. You shouldnt leave stuff like that to chance and watch how things work by accident.

However I am intrigued by all this and would like some insight from somebody who knows more

thoughts? anyone?

iainn
  • 15,089
  • 9
  • 27
  • 37
tomtom
  • 21
  • 3
  • You could always use the spaceship operator - https://stackoverflow.com/questions/30365346/what-is-the-spaceship-operator-in-php-7 – Nigel Ren Jan 25 '20 at 10:03
  • 1
    I believe I already stumbled upon some cases when not following the specs provided wrong results, but I can't find it right now. In any case, by not following the manual, you put yourself at risk of your code no longer sorting correctly in future versions, if the PHP team decides to adjust the sorting algorithm behind the scenes. – Jeto Jan 25 '20 at 10:30
  • [Same question for Javascript](https://stackoverflow.com/questions/52671125/why-does-sorting-a-js-array-of-numbers-with-work/52671431). In short: it all depends on the algorithm(s) behind it. I would follow the specification for the reason stated above. – Jeto Jan 25 '20 at 10:36

3 Answers3

2

You do not have to actually return all -1, 0 and 1 for function to work. Returning 0 would not switch elements in place. Also you can return something like -1345 to act as -1.

Returning only 0 or 1 means you want just to "sink" some elements towards end of array, like have all NULL values at end or start of array and other sorting is not important:

$input = ['c', 'a', null, 'b', null];

usort($input, function ($a, $b) {
    return !is_null($b);
});

// 'b', 'a', 'c', null, null
var_dump($input);

In your case it's sorting to elements without sort_code to end and also sort_code in order

Justinas
  • 34,232
  • 3
  • 56
  • 78
  • I (kinda) see your point. However: a scenario where one would have an array containing values of different types is rather unlikely even in the worst monoliths written decades ago. And even then I would consider it just an awful practice to just write return !is_null(...) just to tidy things up. In the worst case you should still return only ints from that callback. To send the nulls all the way down: if (null === $b) { return 1; } – tomtom Jan 25 '20 at 14:59
  • Or you can utilize lose types in PHP and type conversion to write less code that still works – Justinas Jan 25 '20 at 18:27
1

A comparison sort requires only a total preorder, which is basically a "less than or equal to" relation. In other words, it doesn't matter whether a < b or a = b; you don't need that information to sort, all you need to know is that a <= b. That's why most (if not all) sorting algorithms don't distinguish between values -1 and 0, they only test if the comparison function returns 1 or not 1.

However, to work correctly, the function needs to be transitive (otherwise the relation would not be a total preorder). This is NOT the case of the function given in the original post.

Here is an example:

$array = [
    ['sort_code' => 2],
    [],
    ['sort_code' => 1]
];

uasort($array, function($a, $b) {
    return isset($a['sort_code']) && isset($b['sort_code']) && $a['sort_code'] > $b['sort_code'];
});

var_export($array);

which gives:

array (
  0 => 
  array (
    'sort_code' => 2,
  ),
  1 => 
  array (
  ),
  2 => 
  array (
    'sort_code' => 1,
  ),
)

sort_code 2 comes before 1, which was not intended. That's because, according to the comparison function:

$array[0] <= $array[1]
$array[1] <= $array[2]
$array[0] >  $array[2]

If the relation was transitive, the first two lines would imply:

$array[0] <= $array[2]

which contradicts with what the function says.

Olivier
  • 3,368
  • 1
  • 2
  • 13
-2

The logic in your uasort() has logic that caterers to making fewer overall comparisons and therefore (assuming the sorting rules deliver the expected result) executes with higher efficiency.

PHP's uasort() uses Quicksort as far as I can tell, but the docs say you should not rely on this fact (perhaps in case it changes in the future). An SO reference.

Here is a demo that shows that the sort call only makes enough comparisons to potentially break all ties. Employing a three-way comparison ends up being more expensive because more comparisons are required but would likely deliver a different sort due to the sorting instructions.

$array = [
    ['id' => 1, 'sort_code' => 4],
    ['id' => 2, 'sort_code' => 2],
    ['id' => 3],
    ['id' => 4, 'sort_code' => 8],
    ['id' => 5],
    ['id' => 6, 'sort_code' => 6]
];

uasort($array, function($a, $b) {
    echo "\n({$a['id']} -vs- {$b['id']}) " , ($a['sort_code'] ?? 'undeclared') , " -vs- " , ($b['sort_code'] ?? 'undeclared') , " eval: ";
    echo $eval = (int)(isset($a['sort_code']) && isset($b['sort_code']) && $a['sort_code'] > $b['sort_code']);
    return $eval;
});
echo "\n---\n";
var_export($array);

echo "\n======\n";

uasort($array, function($a, $b) {
    echo "\n({$a['id']} -vs- {$b['id']}) " , ($a['sort_code'] ?? 'undeclared') , " -vs- " , ($b['sort_code'] ?? 'undeclared') , " eval: ";
    echo $eval = (int)(($a['sort_code'] ?? 0) <=> ($b['sort_code'] ?? 0));
    return $eval;
});
echo "\n---\n";
var_export($array);

Output:

(1 -vs- 2) 4 -vs- 2 eval: 1
(1 -vs- 3) 4 -vs- undeclared eval: 0
(3 -vs- 4) undeclared -vs- 8 eval: 0
(4 -vs- 5) 8 -vs- undeclared eval: 0
(5 -vs- 6) undeclared -vs- 6 eval: 0
---
array (
  1 => 
  array (
    'id' => 2,
    'sort_code' => 2,
  ),
  0 => 
  array (
    'id' => 1,
    'sort_code' => 4,
  ),
  2 => 
  array (
    'id' => 3,
  ),
  3 => 
  array (
    'id' => 4,
    'sort_code' => 8,
  ),
  4 => 
  array (
    'id' => 5,
  ),
  5 => 
  array (
    'id' => 6,
    'sort_code' => 6,
  ),
)
======

(2 -vs- 1) 2 -vs- 4 eval: -1
(1 -vs- 3) 4 -vs- undeclared eval: 1
(2 -vs- 3) 2 -vs- undeclared eval: 1
(1 -vs- 4) 4 -vs- 8 eval: -1
(4 -vs- 5) 8 -vs- undeclared eval: 1
(1 -vs- 5) 4 -vs- undeclared eval: 1
(2 -vs- 5) 2 -vs- undeclared eval: 1
(3 -vs- 5) undeclared -vs- undeclared eval: 0
(4 -vs- 6) 8 -vs- 6 eval: 1
(1 -vs- 6) 4 -vs- 6 eval: -1
---
array (
  2 => 
  array (
    'id' => 3,
  ),
  4 => 
  array (
    'id' => 5,
  ),
  1 => 
  array (
    'id' => 2,
    'sort_code' => 2,
  ),
  0 => 
  array (
    'id' => 1,
    'sort_code' => 4,
  ),
  5 => 
  array (
    'id' => 6,
    'sort_code' => 6,
  ),
  3 => 
  array (
    'id' => 4,
    'sort_code' => 8,
  ),
)
mickmackusa
  • 33,121
  • 11
  • 58
  • 86