Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 7 (March 1996)

One of the most powerful features introduced in the latest release of Perl (version 5.0) is the ``reference''. A reference is like a pointer in C -- it provides a means by which a data structure can be referenced ``indirectly''. This indirection can be used in two ways: you can have a section of code act on different data structures at different times, or you can have a data structure be indirectly contained within another data structure, giving you the appearance of nested data structures (lists of lists, lists of associative arrays, and so on).

A reference can be made to scalars, arrays (lists), associative arrays (hashes), or subroutines (``codes''). These references fit into nearly any place a scalar value fits, so they may be held in scalar variables, as values of lists or hashes, and may be passed to and from subroutines. (About the only place where a scalar fits and a reference doesn't fit is as the keys of a hash.)

A reference to a named variable is created with the backslash operator.

        $ref_to_a = \$a;

The value in $ref_to_a is now a reference to the variable $a. We can use this reference to change the value of $a indirectly, using the dereferencing for a scalar:

        ${ $ref_to_a } = 35;

To understand this syntax, imagine replacing the name of a scalar variable (like b in $b) with a block that returns the value of a reference to a scalar variable (in this case, {$ref_to_a}). So, the statement above does the same as:

        $a = 35;

and the following two statements are also equivalent:

        $a++;
        ${ $ref_to_a }++;

I can actually short-cut this a bit, and save myself some typing. If the only thing inside the block is a simple scalar variable, I can omit the curly-braces, as in:

        $$ref_to_a++; # increment $a
        $$ref_to_a = 35; # set $a = 35

Now, so far, this looks like it's just a difficult way to type $a, but let's examine the following code:

        $the_ref = \$a;
        $$the_ref = 35; # set $a to 35
        $the_ref = \$b;
        $$the_ref = 35; # set $b to 35

Note that the same piece of code ($$the_ref = 35) changed $a the first time, and $b the second time. In fact, we can automate this a bit using a foreach loop:

        foreach $the_ref (\$a, \$b) {
                $$the_ref = 35;
        }

So perhaps you are starting to see that I can re-use code and apply it to different variables, thanks to the additional level of indirection here. Good!

In the same way that I can create a reference to a scalar, I can also create a reference to a list variable:

        $list_ref = \@fred;

In this case, the construct {$list_ref} (or equivalently, $list_ref) can be used wherever I would have used ``fred'' as a list variable:

        @$list_ref = (3,4,5); # @fred = (3,4,5)
        $$list_ref[2] = 6; # $fred[2] = 6
        print $#$list_ref; # print $#fred;
        push(@$list_ref,8,9); # push(@fred, 8, 9)

And once again, I can use references to perform indirection:

        foreach $list_ref (\@fred, \@barney) {
                @$list_ref = (); # clear out the array
        }

which sets both @fred and @barney to the empty list (admittedly the hard way!).

Now, so far, I've been creating references to named list variables, but it is also possible to create a list that does not map into an existing list variable. This is called an ``anonymous list'' (or ``anon list''). An anonymous list can be created with the anon list constructor, which builds the list and returns a reference to it:

        $list_ref = [3, 4, 6, 8, 9];

The value of $list_ref here can be used in all the same ways that I used the reference to a named list. The only difference is that I am talking about a variable with no real name -- only a reference. For example, I can extend the array:

        push(@$list_ref, 10, 12);

and find out its size:

        $len = @$list_ref; # $len = 7

or increment its first element:

        $$list_ref[0]++; # first element is now 4

One use for list references and anonymous list references is to pass a list to a subroutine without having to copy the entire data as arguments to the subroutine.

        @a = 1..1000; # create 1000 element list
        $ref_to_a = \@a; # reference to @a
        &brack_it($ref_to_a); # pass reference to subroutine
        sub brack_it {
                my($list_ref) = @_; # name the first parameter
                foreach (@$list_ref) {
                        print "[$_]"; # print element in brackets
                }
                print "\n";
        }

Note that the subroutine is expecting a list reference as its first argument, and is then de-referencing that argument to get to the actual list. In this case, only one scalar is being passed from the main code to the subroutine. Without references, we'd have to make a copy of all 1000 elements to hand to the subroutine.

We can actually skip a step or two here:

        &brack_it(\@a); # call sub on @a

We don't need to store the intermediate reference in a scalar variable -- just pass it as the first argument.

We can also pass an anon-list-ref as the first argument:

        &brack_it(
                [10, 20, 30]
        );

Here, an anonymous list of the three values 10, 20, and 30 is created, and a reference to that list is passed to the subroutine.

What happens to the anonymous list when the subroutine returns? An anonymous list (like all anonymous things) is ``reference counted'', which means that Perl is keeping track of the number of references to the particular chunk of data. As the data is being passed to the subroutine, there is first one, and then two references to the data (the argument list, and then the local copy called ``$list_ref'') . When the subroutine returns, the two references disappear, leaving no references to the data. When this happens, Perl automatically disposes of the memory that was holding the list, much the same as if a subroutine was returning that had local variables that were no longer valid.

References can also be copied:

        $a = [ 20, 30, 40 ]; # $a is a listref
        $b = $a;

Now both $b and $a point at the same data. If I append elements to @$a, I can access the same data with @$b. I can even remove $a,

        undef $a;

and $b is still pointing at the original data! However, when I finally remove $b,

        undef $b;

the storage holding the anonymous list is finally reclaimed, as there are no longer any references to the data.

I can also create references to associative arrays (or hashes):

        $hash_ref = \%score;
        ${ $hash_ref }{"fred"} = 205; # $score{"fred"} = 205
        $$hash_ref{"barney"} = 195; # $score{"barney"} = 195
        @$hash_ref{"wilma","betty"} = (170,180);
                # @score{"wilma","betty"} = (170,180);
        @the_keys = keys %$hash_ref; # keys %score

Again, if I use {$hash_ref} (or even just $hash_ref) every place where I would use the name of the associative array (score for %score), I get the proper syntax to perform an indirect access for %score.

I can create a general-purpose subroutine to print the keys and corresponding values (sorted by keys), and pass it a hash-ref:

        sub show_hash {
                my($hash_ref) = @_;
                foreach (sort keys %$hash_ref) {
                        print "$_ => $$hash_ref{$_}\n";
                }
        }

        &show_hash(\%score);

The advantage here again is that I'm passing only a single scalar (a hash reference) to the subroutine, rather than passing the entire anonymous list (flattened out into a flat list).

Lists and hashes can also be passed back from subroutines using a similar syntax. For example, here's a subroutine that returns back a reference to a list:

        sub return_it {
                my(@list) = 1..100;
                \@list;
        }

        $list_ref = &return_it();
        print $$list_ref[4]; # prints 5, or $list[4]

Each time this subroutine is called, a new list of 1 through 100 is created. The subroutine then returns a reference to that list. Even though the list ordinarily would have been destroyed at the exit of the subroutine (it's a local to the subroutine), Perl notes that there is still an outstanding reference to the list (the return value), and turns it into an anonymous list. This reference is then copied into $list_ref, which can then be used like any list reference. As long as at least one reference for the list is valid, the list hangs around.

Each invocation of &return_it() will create a distinct @list, and therefore return a reference to a different data structure. Thus, we can turn a named variable within a subroutine to an unnamed variable outside the subroutine rather trivially. Also note that we are not returning the values of the list, but merely a simple scalar that references the values, thus eliminating the overhead of passing all those values back on the perl ``stack''.

Like anonymous lists, I can create an anonymous ``hash'':

        $hash_ref = {
                "fred", 205,
                "barney", 195,
                "dino", 30,
        };

        print ${$hash_ref}{"fred"}; # prints 205
        print $$hash_ref{"dino"}; # prints 30

Again, this is a data structure that acts as if I had created a variable (an associative array) and then taken a reference to it, but the variable does not have a real name. This data is accessible only indirectly through $hash_ref. I could pass such a reference to &show_hash (given above) like so:

        &show_hash( {
                "fred", 205,
                "barney", 195,
                "dino", 30,
        } );

In this case, I'm not even bothering to store the hash reference into a variable -- I'm passing the reference immediately to the subroutine. When the subroutine returns, no-one is holding a reference to the data, so the data disappears.

Well, I hope this excursion into references has proved helpful to you. In a future column, I shall endeavor to illustrate references to subroutines, and even altering the Perl ``symbol table'' at runtime using references.

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 7 (March 1996)