11

I understand how to get the results I want from using Perl's sort() function, this is more a question about the inner workings of sort().

Where do the "$a" and "$b" variables come from? I read through the documentation for sort and it seems unclear. What are "$a" and "$b" and what makes them special?

ex:

my @sorted_list = sort {$a cmp $b} @unsorted_list;

How does sort know what to do with "$a" and "$b" and why don't you get the "Global symbol requires explicit package name" error for "$a" or "$b"?

tjwrona1992
  • 7,723
  • 5
  • 26
  • 76
  • 1
    $a and $b are elements in red boxes. http://en.wikipedia.org/wiki/Merge_sort#mediaviewer/File:Merge-sort-example-300px.gif – mpapec Sep 30 '14 at 18:48

3 Answers3

21

$a and $b are exempt global variables; they are exempt in that Perl allows them to be used (anywhere) without being declared. They are set by the sort function. Use of any other undeclared global in sort (in strict mode) will trigger an error.

The sort function accepts various forms of input, one being a code block, which is the form you are referring to.

{$a cmp $b} is a code block, it is parsed and passed as a "chunk of code" to the sort function, and Perl checks the arguments for sort and if it receives a code block, sort will set $a and $b, if they exist as package globals within the code block, and assign each pair of items being sorted to $a and $b. All you have to do is refer to them to control the sort algorithm. Otherwise, the internal algorithm is used (which I think is merge sort).

http://perldoc.perl.org/functions/sort.html

$a and $b are not lexicals, they are package globals (or just globals).

In a main you can write:

sort {$main::a cmp $main::b} @list;

Or in another package, you could write:

package foo;

sort {$foo::a cmp $foo::b} @list;

You shouldn't actually prefix like this; I am demonstrating that $a and $b are actually globals within your current package, and not some magic $a within the sort function, although Perl knows to allow you to define them even with strict mode.

You can't just use any variables (in strict mode). Try:

sort {$A cmp $B} @list;

Global symbol "$A" requires explicit package name at sort.pl

You cannot use a lexical (my $a) in scope of sort.

my $a;
sort {$a cmp $b} @list;

Can't use "my $a" in sort comparison at sort.pl line 13.

$a and $b are special anywhere in Perl. They are exempt from strict mode, which is unrelated to sort, though sort was the reason for the exemption.

codenheim
  • 19,092
  • 1
  • 51
  • 77
5

What exactly are “$a” and “$b” in [the compare code of] Perl's “sort()” function?

Two values from the list of values to sort. The block is to return information as to how one should be positioned to the other in the final result.

What are "$a" and "$b" and what makes them special?

Package variables, and there's nothing special about them except that use strict 'vars'; does not consider using them to be an error.

Where do the "$a" and "$b" variables come from?

They are populated by sort.

How does sort know what to do with "$a" and "$b"

It doesn't do anything with them except populate them as required to perform its function.

why don't you get the "Global symbol requires explicit package name" error for "$a" or "$b"?

That would make it rather hard to use them!

What happens if you define a local variable, my $a or my $b, and then try and use sort within a scope where those [lexical] variables are visible?

If your compare function is in scope of a my $a and/or my $b, it will use those variables instead of the package variables sort populates.

Perl realizes you might be an easy mistake to make, so it checks for it.

$ perl -c -e'sort { my ($a,$b); $a cmp $b } @a;'
Can't use "my $a" in sort comparison at -e line 1.
ikegami
  • 322,729
  • 15
  • 228
  • 466
  • So what you are saying is "$a" and "$b" are specifically designated for the use of the `sort()` function and no other function? What happens if you define a local variable, `my $a` or `my $b`, and then try and use sort within a scope where those local variables are visible? – tjwrona1992 Sep 30 '14 at 19:15
  • 1
    @tjwrona1992 give it a go and see what happens! – i alarmed alien Sep 30 '14 at 19:16
  • @tjwrona1992 - See the bottom of my answer. You cannot use a lexical (but why not try it yourself and see?) – codenheim Sep 30 '14 at 19:20
  • @tjwrona1992, Information got accidentally delete from my answer. I've readded it as an answer to your comment. – ikegami Sep 30 '14 at 19:24
  • @ikegami - I wrote a quick test script and got this error "Can't use "my $a" in sort comparison". It seems as though it doesn't like when you define "$a" or "$b" locally. – tjwrona1992 Sep 30 '14 at 19:27
  • @tjwrona1992, Oh cool, it doesn't allow you to *declare* `$a` and `$b` *as a lexical* in a *sort code block*. – ikegami Sep 30 '14 at 19:29
  • Messing with `sort()` even more I tried another test. If you remove `use strict` it seems to allow you to use any global variable in the `sort()` code block such as `my @sorted_list = sort {$c cmp $d} @list`. Without `use strict` or `use warnings`, this code will run without error but will not sort the list. In the end @sorted_list will equal @list. – tjwrona1992 Sep 30 '14 at 19:36
  • More accurately: If you remove `use strict;`, `strict` will no longer forbid you from using undeclared package variables. // Of course `sort {$c cmp $d} @list` won't wort. If you don't assign anything to `$c` and `$d`, your compare function will always return zero. – ikegami Sep 30 '14 at 19:37
  • @tjwrona1992 - Right, its because you can pass any code block, with any number of variables, but sort will only set $a and $b by name. – codenheim Sep 30 '14 at 19:41
2

The other answers are good, so I won't repeat, but one part of your question wasn't really answered.

Where do the "$a" and "$b" variables come from?

As codenheim wrote, sort is basically reaching down into your module (whether it's an explicit module or the main module) and defining the variables in your scope. How? Simply by temporarily turning off strict refs and spelling out the fully qualified variable name. With that done, the variable can be accessed via ${$variable_name}.

my $pkgname = caller();
my $varname = "${pkgname}::a";
no strict 'refs';
${$varname} = value;

Note: I discovered this technique in the source for List::Util.

piojo
  • 5,023
  • 1
  • 19
  • 30