Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 8 (May 1996)

In my last column, I introduced the notion of a Perl ``reference'', like a pointer in C. I talked about references to scalars, arrays, and associative arrays. In this column, I'm going to talk about references to subroutines, and using references to mess with the Perl run-time symbol table to create aliases.

A reference to a subroutine can be created with the ``make-a-reference-to'' operator, a backslash (one of the probably 17 meanings for backslash):

        sub wilma {
                print "Hello, @_!";
        }
        $ref_to_wilma = \&wilma;

Here, the subroutine &wilma is defined, and then a reference to that subroutine is created and stored into $ref_to_wilma. It is not necessary to define a subroutine before taking a reference to it however.

This $ref_to_wilma can be used wherever the invocation of wilma is also used, although we have to ``dereference'' it in the same way we dereference other references. The syntactic rules are similar -- replace the name ``wilma'' in an invocation of &wilma with {$ref_to_wilma} (or $ref_to_wilma, because it is a scalar variable), as in

        &wilma("fred"); # say hello to fred
        &{ $ref_to_wilma }("fred"); # same thing
        &$ref_to_wilma("fred"); # also same thing

Now, these refs can be used to select different operations on the same data. Consider a series of subroutines to perform the basic four math operations on their arguments, returning the result:

        sub add { $_[0]+$_[1]; }
        sub subtract { $_[0]-$_[1]; }
        sub multiply { $_[0]*$_[1]; }
        sub divide { $_[0]/$_[1]; }

Now, let's allow the user to enter one of the operators, followed by the two operands (prefix notation), and then select one of the four subroutines, using conventional code (no references):

        print "enter operator op1 op2\n";
        $_ = <STDIN>;
        ## break the result on whitespace:
        ($op,$op1,$op2) = split;
        if ($op eq "+") {
                $res = &add($op1,$op2);
        } elsif ($op eq "-") {
                $res = &subtract($op1,$op2);
        } elsif ($op eq "*") {
                $res = &multiply($op1,$op2);
        } else { # divide, we hope
                $res = &divide($op1,$op2);
        }
        print "result is $res\n";

Boy, think of how much harder this would be if I had 15 operators. The regularity of the pattern of code makes me think that I can factor that out somehow, and in fact, I can, using references.

        ## initialize op table
        %op_table = (
                "+" => \&add,
                "-" => \&subtract,
                "*" => \&multiply,
                "/" => \&divide,
        );
        print "enter operator op1 op2\n";
        $_ = <STDIN>;
        ## break the result on whitespace:
        ($op,$op1,$op2) = split;
        ## get reference:
        $sub_ref = $op_table{$op};
        ## and now evaluate
        $res = &{$sub_ref}($op1,$op2);
        print "result is $res\n";

First, $op is used as a key into the %op_table associative array, selecting one of the four subroutine references into $sub_ref. That reference is then de-referenced, passing the two operands. This is possible only because all four subroutines take the same style of arguments. Had there been some irregularity, we would have been in trouble.

However, we can shorten the lookup-execute steps even further, as in

        $res = &{$op_table{$op}}($op1,$op2);

which simply does the lookup and dereference all in one fell swoop. Slick, eh?

Like anonymous lists and anonymous associative arrays, I can create an anonymous subroutine. For example, back to something like the &wilma subroutine,

        $greet_ref = sub {
                print "hello, @_!\n";
        };

What we now have in $say_ref is a reference to a subroutine, but the subroutine has no name. This subroutine is invoked by dereferencing the subroutine reference, in the same way as other subroutine references:

        &$greet_ref("barney"); # hello, barney!

One advantage of the anonymous subroutine is that it can be used in places where coming up with names might seem a little silly. For example, the names of &add, &subtract, &multiply, and &divide were rather arbitrary in the previous examples. As I add operators, I'd have to keep naming the subroutines, even though the name was used in only one other place -- the %op_table. So, using anonymous subroutines, I can eliminate the names entirely:

        ## initialize op table
        %op_table = (
                "+" => sub { $_[0]+$_[1] },
                "-" => sub { $_[0]-$_[1] },
                "*" => sub { $_[0]*$_[1] },
                "/" => sub { $_[0]/$_[1] },
        );
                
and in fact, this %op_table functions identically to the previous one, except
that I didn't have to hurt myself coming up with names for the four
subroutines. This is really a help for maintenance -- to add exponentiation
(using **) for example, all I have to do is add an entry to the %op_table:

                "**" => sub { $_[0]**$_[1] },

rather than first coming up with a named subroutine, and then adding a reference to that subroutine in the %op_table.

Subroutine references are also handy to pass into subroutines. For example, suppose I wrote a routine to throw away blank lines until it got something useful, and then return the useful thing. As its first arg, it could take a subroutine that defines how to ``get the next thing''. Sometimes, this might be ``read from a filehandle'', and other times it might be ``shift from an array''. Here's how it would look:

        $next = &non_blank(
                sub { <STDIN>; }
        ); # read from stdin
        $next = &non_blank(
                sub { shift @cache; }
        }; # grab from list @cache

Within the &non_blank subroutine, the first parameter is then a reference to a subroutine that will ``fetch the next value''. Here's one possible implementation of that subroutine:

        sub non_blank {
                my($scanner) = @_;
                my($return);
                {
                        $return = &{$scanner}();
                        redo until $return =~ /\S/;
                }
                $return;
        }

Here, the subroutine referenced by $scanner is invoked repeatedly until its return value (stuffed into $return) has a non-blank value in it. When it is invoked with the subroutine containing <STDIN>, a line at a time is read. When it is invoked with shift @cache, we get a line from @cache each time instead.

Unfortunately, while testing this, I discovered a problem. Sometimes, there are no more further things in the stream being scanned by &non_blank that contain a non-blank character, and this subroutine then runs indefinitely. Ouch! So, a simple patch in the logic, as well as a modification to the definition fixes it. I'm going to return undef if no further element fits the needs, as in

        sub non_blank {
                my($scanner) = @_;
                my($return);
                {
                        $return = &{$scanner}();
                        last unless defined $return;
                        redo until $return =~ /\S/;
                }
                $return;
        }

There. That handles it. Now, if my program required scanning <STDIN>, @cache, or even calling another subroutine to fetch the next non-blank line, it doesn't matter. And this same bug-fix is handled once, rather than having to patch all the similar-looking code in the program.

By the way, I got carried away with punctuation there -- let's simplify that middle one for the record to:

        $return = &$scanner();

Enough on subroutines for the moment. Let's turn to another use for references -- as a way of modifying the symbol table of Perl. Why would you want to do this? One reason is to create an alias for another symbol:

        *fred = *barney;

Here, we've said that the ``fred'' symbol is to be aliased to the ``barney'' symbol. We call this a ``glob'' reference, because it is hacking the global symbol table.

Once this is done, every use of barney in a variable can be replaced by fred:

        $barney = 3;
        $fred = 3; # same thing
        @barney = (1,2,4);
        print "@fred"; # prints "1 2 4"
        %fred = ("a" => 1);
        print $barney{"a"}; # prints 1

Even subroutines get aliased in this way, as well as filehandles, directory handles, and format names.

We can be more selective by giving the glob assignment a specific reference type:

        *fred = \&wilma;

Now here, $fred is still $barney, @fred is still @barney, %fred is still %barney, but &fred is &wilma. You can often find this inside a block with a local glob:

        *fred = *barney;
        {
                local(*fred) = \&wilma;
                &fred(3,4,5); # &wilma(3,4,5)
        }
        &fred(6,7,8); # &barney(6,7,8)

The localized glob assignment is effective only within the block. When we exit the block, the previous glob assignment appears again, as if by magic.

We can re-write &non_blank above to use a local glob alias rather than an explicit dereferencing:

        sub non_blank {
                local(*scanner) = @_;
                my($return);
                {
                        $return = &scanner();
                        last unless defined $return;
                        redo until $return =~ /\S/;
                }
                $return;
        }

Notice that we can now invoke &scanner, rather than the clumsier &$scanner.

We can also use glob references to tidy up the &brack_it subroutine from the last column. Rather than explicitly dereferencing the value $list_ref in:

        sub brack_it {
                my($list_ref) = @_;
                foreach (@$list_ref) {
                        print "[$_]"; # print element in brackets
                }
                print "\n";
        }

we can replace it with a glob assignment:

        sub brack_it {
                local(*list) = @_; # list ref we hope
                foreach (@list) {
                        print "[$_]"; # print element in brackets
                }
                print "\n";
        }

Another use of glob assignments is to make a sort subroutine a little more generic. For example, the classic ``sort by value'' for a particular associative array is written as:

        sub by_numeric_value {
                $hash{$a} <=> $hash{$b}
        }

which works fine as a sort subroutine provided the data is in the %hash associative array, like so:

        sub sort_hash_by_value {
                sort by_numeric_value keys %hash;
        }
        @them = &sort_hash_by_value;

Here, the value in @them is the keys of %hash sorted by their corresponding numeric value. We can now make this routine more generic:

        sub sort_by_value {
                local(*hash) = @_; # ref to hash
                sort by_numeric_value keys %hash;
        }
        @them_hash = &sort_by_value(\%hash);

So far, this does the same thing as the previous one, but I've passed in the name of %hash as the first argument. This then gets aliased to (gasp) itself, and the subroutine functions as before. Where it gets fun is when I can pass other associative arrays:

        @them_there = &sort_by_value(\%there);

which now does exactly the same thing on the %there associative array! In this case, the sort subroutine &sort_hash_by_value thinks it is accessing %hash, when in fact because of the alias, it is accessing %there. Very cool.

Once again, I hope this excursion into the features of Perl (especially the more powerful features of references) has been useful for you. Enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 8 (May 1996)