Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 08 (Jan 2000)

[suggested title: Getting better references through Perl]

To start off the new year, I'm going to start with one of the Perl basics: references. References permit complex data types to be cleanly represented in Perl, and provide ways to pass and return large amounts of data into and out of subroutines.

Let's look at a reference, first by example. Suppose I wanted to have a ``reverse chomp'' operator, that would add a newline to every element of an array. I could write the code as follows:

  for $element (@array) {
    $element .= "\n";
  }

And while this would certainly work, it locks me in to a specific variable named @array. If I wanted to make a general subroutine, I'd be out of luck, without references (unless I wanted to do something evil and non-scalable like alter @_ directly.) A reference permits the selected variable to be changed at will, using an additional level of indirection. Consider the following code:

  $this_reference = \@named_array;

Here, the \ operator ``takes a reference to'' the named @named_array variable. The value is called an ``array reference'', or arrayref for short (occasionally incorrectly called listref). The reference fits nearly anywhere a scalar value fits, and so we've shoved it into $this_reference. This arrayref ``points at'' @named_array.

To use the reference we must dereference it. Let's set @named_array to the values of 1 through 10, but using the reference:

  # @named_array       = (1..10);
    @{$this_reference} = (1..10);

The syntax for dereferencing is to write the operation as we would without the reference, but then replace the name of the variable with a block of code (enclosed in braces) returning a reference to the variable. So, we've now got a piece of code that affects @named_array, at least this time. However, with a different array reference stored in $this_reference, the same code affects a different variable:

  $this_reference = \@another_array;
  @{$this_reference) = (1..10);

Now we've set @another_array to those 10 values. We can even use the reference syntax to access individual elements:

  # $another_array    [2] = "three";
    ${$this_reference}[2] = "three";

Again, replace the name with a block returning the thing holding the reference, and we get the dereferencing form.

So, we can start to see how to make our ``unchomp'' work. We'll write the code so that it uses a reference, and pass that reference as a parameter:

  sub unchomp {
    my $ref = shift;
    for $element (@{$ref}) {
      $element .= "\n";
    }
  }

And then call it with a reference to the array we want unchomped:

  unchomp(\@named_array);
  unchomp(\@another_array);

The reference passes as a single parameter, which is then shifted into $ref, and dereferenced into the foreach loop. Bingo.

Since the reference fits into a scalar variable, can we have an list element be a reference? Certainly:

  for $aref (\@named_array, \@another_array) {
    unchomp($aref);
  }

In fact, we can even store this list into another array:

  @do_these = (\@named_array, \@another_array);
  for $aref (@do_these) {
    unchomp($aref);
  }

But what have we done? We now have an array, each element of which is an arrayref, which can in turn be dereferenced to access the individual elements. So, what does it look like to access each layer?

  @do_these # two elements, each an arrayref
  @{$do_these[0]} # @named_array
  ${$do_these[0]}[3] # $named_array[3]
  @{$do_these[1]}[4,5] # @another_array[4,5]
  $#{$do_these[0]} # $#named_array

Some people call this structure a ``list of lists'', but that's pretty loose, since really it's an array of arrayrefs. Perl doesn't have ``lists of lists''.

Now, let's simplify the syntax a bit. The rules above always work for dereferencing (replace the name with a block), but can start looking pretty ugly for common things. First, if the expression inside the block is only a simple scalar variable, we can lose the curly braces. Thus, we can change @{$aref} to @$aref, but we have to leave @{$do_these[0]} alone.

There's another optimization available for accessing array elements through a reference. In place of:

  ${WHATEVER}[WHEREVER]

we can always write:

  WHATEVER->[WHEREVER]

The -> operator followed by square brackets means to treat the previous value as an arrayref, dereference it, and then select the requested element. Thus, we can rewrite

  ${$aref}[2]

to just:

  $aref->[2]

and:

  ${$do_these[0]}[3]

to simply:

  $do_these[0]->[3]

There's one more optimization available for that last one. If the arrow ends up between subscripts, we can drop the arrow safely:

  $do_these[0][3]

Which looks vaguely C-like. Cool. We cannot remove the arrow between $aref and [2] on the previous example though, because that would be looking at an element of @aref, not at all what we want.

Did we need the named arrays here, to set up @do_these? Nope. We can also use an ``anonymous array constructor'':

  $do_these[0] = [1..10];

Here, the value 1..10 is computed in a list context, then placed into an array structure. A reference to this array is returned as the value of the square brackets, and placed into $do_these[0]. Except for the fact that we don't have a named array any more, the rest of the code would run identically. We could even initialize the entire array as:

  @do_these = ([1..10], [11..20]);

And we get two different 10-element arrays, held as arrayrefs in @do_these. Note that the placement of square brackets and parens here is essential; swapping them would have gotten us into a mess.

Adding elements to an array has always been a ``self-extend'' operation in Perl. Assigning to elements that don't yet exist cause the array to be autoextended:

  @a = ();
  $a[3] = "barney";
  $a[7] = "dino";

And we end up with:

  @a = (undef, undef, undef,
        "barney", undef, undef,
        undef, "dino");

Notice that the intermediate elements are automatically undef. Similarly, any variable when used as if it was an arrayref, but not yet containing anything (or just undef) is automatically stuffed with an arrayref to an empty anonymous array. This process is called ``autovivification'', and makes populating so-called ``multidimensional'' arrays trivial:

  @a = ();
  $a[3]->[2] = "hello";
  # same as:
  # $a[3] = [];
  # $a[3]->[2] = "hello";

This even works on multiple levels:

  @a = ();
  $a[2]->[4]->[5]->[3] = "foo";
  # or $a[2][4][5][3] = "foo";

Very nice.

Arrays aren't the only things that can be referenced. Hashes are also another popular target:

  %last_name = (
    "fred" => "flintstone",
    "wilma" => "flintstone",
    "barney" => "rubble",
  );
  $hashref = \%last_name;
  @firsts = keys %{$hashref};

That last line can be written as keys %$hashref as well, using the same abbreviations given earlier. Accessing an element can also be abbreviated:

  # looking at $last_name{"fred"}:
  ${$hashref}{"fred"}
  # removing optional {}'s:
  $$hashref{"fred"}
  # or switching to arrow form:
  $hashref->{"fred"}

We can put an arrayref as a hash value:

  $score{"fred"} = [180, 150, 165];
  $score{"barney"} = [172, 190, 158];

and then access that with everything we've seen:

  @fred_scores = @{$score{"fred"}};
  ${$score{"fred"}}[2] = 168; # fix 165 to 168
  $score{"fred"}->[2] = 168; # same thing
  $score{"fred"}[2] = 168; # same thing

Note that we can drop an arrow between either kind of subscript.

Like arrayrefs, hashrefs can also appear from nowhere using the autovivification:

  %bytes = ();
  # ...
  $bytes{$src}{$dest} += $count;

This creates a hash of hashrefs, with each hashref being added only when a new $src shows up, and each second-level hash element being added for a new $dst for that $src.

Hashrefs can also come from anonymous hash constructors:

  $hashref = {
    "fred" => "flintstone",
    "barney" => "rubble",
    "betty" => "rubble",
  };

The value inside the braces is evaluated like the right side of a hash assignment (list context, alternating key/value pairs). A hash is built, and a reference to that hash is returned. So, to build a reference to the scores above, we could do this:

  $game = {
    "fred" => [180, 150, 165],
    "barney" => [172, 190, 158],
  };

Which could have been part of the league scores:

  $week[0] = $game;
  $week[1] = {
    "fred" => [201, 188, 65],
    "barney" => [189, 252, 99],
  };
  # or more directly:
  @week = ({
    "fred" => [180, 150, 165],
    "barney" => [172, 190, 158],
  },{
    "fred" => [201, 188, 65],
    "barney" => [189, 252, 99],
  });

Now we get the score for game $i of week $j for fred with $week[$j-1]{"fred"}[$i-1], subtracting 1 because Perl counts starting at 0, not 1.

Less frequently used, but still just as cool, are scalar references (scalarrefs):

  $that = \$scalar_var;
  $$that = 17; # $scalar_var = 17

Scalarrefs autovifify as well, althought that's not very impressive:

  $that = undef;
  $$that = 3; # anonymous var becomes 3

Anonymous data structures can also occur when a variable goes out of scope:

  my $x;
  {
    my $prince = "van gogh";
    $x = \$prince;
  }

Here, $x is pointing to what is now an anonymous string, the artist formerly known as $prince. This frequently happens when returning a data structure reference from a subroutine:

  sub marine {
    my %things;
    # ...
    return \%things;
  }

The return value here will be a hashref, now pointing into the anonymous value. New invocations of the subroutine create new instances of %things. Memory for the previous return values is reclaimed only when the last reference is removed.

So, that should give you a start into references. For further information, check the documentation that comes with Perl, especially perlref and perllol and perldsc, as well as chapter 4 of my book Programming Perl, Second Edition from O'Reilly and Associates (co-authored by Larry Wall and Tom Christiansen). In a future column here, I'll look at how references may also be made to subroutines and filehandles. Until then, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 08 (Jan 2000)