Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 08 (Jan 2000)
[suggested title: Getting better references through Perl]
To start off the new year, I'm going to start with one of the Perl basics: references. References permit complex data types to be cleanly represented in Perl, and provide ways to pass and return large amounts of data into and out of subroutines.
Let's look at a reference, first by example. Suppose I wanted to have a ``reverse chomp'' operator, that would add a newline to every element of an array. I could write the code as follows:
for $element (@array) { $element .= "\n"; }
And while this would certainly work, it locks me in to a specific
variable named @array
. If I wanted to make a general subroutine,
I'd be out of luck, without references (unless I wanted to do
something evil and non-scalable like alter @_
directly.) A
reference permits the selected variable to be changed at will, using
an additional level of indirection. Consider the following code:
$this_reference = \@named_array;
Here, the \
operator ``takes a reference to'' the named
@named_array
variable. The value is called an ``array reference'', or
arrayref for short (occasionally incorrectly called listref). The
reference fits nearly anywhere a scalar value fits, and so we've
shoved it into $this_reference
. This arrayref ``points at''
@named_array
.
To use the reference we must dereference it. Let's set
@named_array
to the values of 1 through 10, but using the
reference:
# @named_array = (1..10); @{$this_reference} = (1..10);
The syntax for dereferencing is to write the operation as we would
without the reference, but then replace the name of the variable
with a block of code (enclosed in braces) returning a reference to the
variable. So, we've now got a piece of code that affects
@named_array
, at least this time. However, with a different array
reference stored in $this_reference
, the same code affects a
different variable:
$this_reference = \@another_array; @{$this_reference) = (1..10);
Now we've set @another_array
to those 10 values. We can even use
the reference syntax to access individual elements:
# $another_array [2] = "three"; ${$this_reference}[2] = "three";
Again, replace the name with a block returning the thing holding the reference, and we get the dereferencing form.
So, we can start to see how to make our ``unchomp'' work. We'll write the code so that it uses a reference, and pass that reference as a parameter:
sub unchomp { my $ref = shift; for $element (@{$ref}) { $element .= "\n"; } }
And then call it with a reference to the array we want unchomped:
unchomp(\@named_array); unchomp(\@another_array);
The reference passes as a single parameter, which is then shifted
into $ref
, and dereferenced into the foreach loop. Bingo.
Since the reference fits into a scalar variable, can we have an list element be a reference? Certainly:
for $aref (\@named_array, \@another_array) { unchomp($aref); }
In fact, we can even store this list into another array:
@do_these = (\@named_array, \@another_array); for $aref (@do_these) { unchomp($aref); }
But what have we done? We now have an array, each element of which is an arrayref, which can in turn be dereferenced to access the individual elements. So, what does it look like to access each layer?
@do_these # two elements, each an arrayref @{$do_these[0]} # @named_array ${$do_these[0]}[3] # $named_array[3] @{$do_these[1]}[4,5] # @another_array[4,5] $#{$do_these[0]} # $#named_array
Some people call this structure a ``list of lists'', but that's pretty loose, since really it's an array of arrayrefs. Perl doesn't have ``lists of lists''.
Now, let's simplify the syntax a bit. The rules above always work for
dereferencing (replace the name with a block), but can start looking
pretty ugly for common things. First, if the expression inside the
block is only a simple scalar variable, we can lose the curly
braces. Thus, we can change @{$aref}
to @$aref
, but we have
to leave @{$do_these[0]}
alone.
There's another optimization available for accessing array elements through a reference. In place of:
${WHATEVER}[WHEREVER]
we can always write:
WHATEVER->[WHEREVER]
The ->
operator followed by square brackets means to treat
the previous value as an arrayref, dereference it, and then select
the requested element. Thus, we can rewrite
${$aref}[2]
to just:
$aref->[2]
and:
${$do_these[0]}[3]
to simply:
$do_these[0]->[3]
There's one more optimization available for that last one. If the arrow ends up between subscripts, we can drop the arrow safely:
$do_these[0][3]
Which looks vaguely C-like. Cool. We cannot remove the arrow
between $aref
and [2]
on the previous example though, because
that would be looking at an element of @aref
, not at all what we
want.
Did we need the named arrays here, to set up @do_these
? Nope. We
can also use an ``anonymous array constructor'':
$do_these[0] = [1..10];
Here, the value 1..10
is computed in a list context, then placed
into an array structure. A reference to this array is returned as the
value of the square brackets, and placed into $do_these[0]
. Except
for the fact that we don't have a named array any more, the rest of
the code would run identically. We could even initialize the entire
array as:
@do_these = ([1..10], [11..20]);
And we get two different 10-element arrays, held as arrayrefs in
@do_these
. Note that the placement of square brackets and parens
here is essential; swapping them would have gotten us into a mess.
Adding elements to an array has always been a ``self-extend'' operation in Perl. Assigning to elements that don't yet exist cause the array to be autoextended:
@a = (); $a[3] = "barney"; $a[7] = "dino";
And we end up with:
@a = (undef, undef, undef, "barney", undef, undef, undef, "dino");
Notice that the intermediate elements are automatically undef
.
Similarly, any variable when used as if it was an arrayref, but not
yet containing anything (or just undef
) is automatically stuffed
with an arrayref to an empty anonymous array. This process is called
``autovivification'', and makes populating so-called ``multidimensional''
arrays trivial:
@a = (); $a[3]->[2] = "hello"; # same as: # $a[3] = []; # $a[3]->[2] = "hello";
This even works on multiple levels:
@a = (); $a[2]->[4]->[5]->[3] = "foo"; # or $a[2][4][5][3] = "foo";
Very nice.
Arrays aren't the only things that can be referenced. Hashes are also another popular target:
%last_name = ( "fred" => "flintstone", "wilma" => "flintstone", "barney" => "rubble", ); $hashref = \%last_name; @firsts = keys %{$hashref};
That last line can be written as keys %$hashref
as well, using the
same abbreviations given earlier. Accessing an element can also be
abbreviated:
# looking at $last_name{"fred"}: ${$hashref}{"fred"} # removing optional {}'s: $$hashref{"fred"} # or switching to arrow form: $hashref->{"fred"}
We can put an arrayref as a hash value:
$score{"fred"} = [180, 150, 165]; $score{"barney"} = [172, 190, 158];
and then access that with everything we've seen:
@fred_scores = @{$score{"fred"}}; ${$score{"fred"}}[2] = 168; # fix 165 to 168 $score{"fred"}->[2] = 168; # same thing $score{"fred"}[2] = 168; # same thing
Note that we can drop an arrow between either kind of subscript.
Like arrayrefs, hashrefs can also appear from nowhere using the autovivification:
%bytes = (); # ... $bytes{$src}{$dest} += $count;
This creates a hash of hashrefs, with each hashref being added only
when a new $src
shows up, and each second-level hash element being
added for a new $dst
for that $src
.
Hashrefs can also come from anonymous hash constructors:
$hashref = { "fred" => "flintstone", "barney" => "rubble", "betty" => "rubble", };
The value inside the braces is evaluated like the right side of a hash assignment (list context, alternating key/value pairs). A hash is built, and a reference to that hash is returned. So, to build a reference to the scores above, we could do this:
$game = { "fred" => [180, 150, 165], "barney" => [172, 190, 158], };
Which could have been part of the league scores:
$week[0] = $game; $week[1] = { "fred" => [201, 188, 65], "barney" => [189, 252, 99], }; # or more directly: @week = ({ "fred" => [180, 150, 165], "barney" => [172, 190, 158], },{ "fred" => [201, 188, 65], "barney" => [189, 252, 99], });
Now we get the score for game $i
of week $j
for fred with
$week[$j-1]{"fred"}[$i-1]
, subtracting 1 because Perl counts
starting at 0, not 1.
Less frequently used, but still just as cool, are scalar references (scalarrefs):
$that = \$scalar_var; $$that = 17; # $scalar_var = 17
Scalarrefs autovifify as well, althought that's not very impressive:
$that = undef; $$that = 3; # anonymous var becomes 3
Anonymous data structures can also occur when a variable goes out of scope:
my $x; { my $prince = "van gogh"; $x = \$prince; }
Here, $x is pointing to what is now an anonymous string, the artist
formerly known as $prince
. This frequently happens when returning
a data structure reference from a subroutine:
sub marine { my %things; # ... return \%things; }
The return value here will be a hashref, now pointing into the
anonymous value. New invocations of the subroutine create new
instances of %things
. Memory for the previous return values is
reclaimed only when the last reference is removed.
So, that should give you a start into references. For further
information, check the documentation that comes with Perl, especially
perlref
and perllol
and perldsc
, as well as chapter 4 of my
book Programming Perl, Second Edition from O'Reilly and Associates
(co-authored by Larry Wall and Tom Christiansen). In a future column
here, I'll look at how references may also be made to subroutines and
filehandles. Until then, enjoy!