Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 58 (Apr 2004)

[suggested title: ``List Manipulation'']

Although many of my past columns have dealt with entire programs, I find that people still send me email about the basics. So, this month, I thought I'd address an issue that people seem to keep asking about: basic list manipulation.

One very common task with lists is selection: finding items in a list that meet a particular condition. For example, let's find all the odd elements in a list @input:

  my @output;
  foreach (@input) {
    if ($_ % 2) {
      push @output, $_;
    }
  }

Of course, we can shorten and clean this up a bit by using a ``backwards if'':

  my @output;
  foreach (@input) {
    push @output, $_ if $_ % 2;
  }

Or we can shorten the loop up a different way using a ``backwards foreach'':

  my @output;
  $_ % 2 and push @output, $_ foreach @input;

But alas, we can't nest the ``backwards if'' and the ``backwards foreach''. It's been argued that this leads to the potential for much abuse, and is thus not permitted. Even the use of and here as a conditional execution is arguably obfuscated enough.

However, we're also using a hammer where we should be using a screwdriver. The grep operator does a fine job of selecting elements from a list:

  my @output = grep $_ % 2, @input;

Each element of @input is placed temporarily in $_, and then the $_ % 2 expression is evaluated for a true/false value. When the expression is true, the corresponding element of the list is included in the output. Thus, we get the odd-valued elements in a manner even shorter than before.

What if we wanted just the odd-positioned elements? It's a bit trickier, but still not very hard, if you do it in two steps. First, let's construct a list of all the odd-position indicies:

  my @odd_indicies = grep $_ % 2, 0..$#input;

Here, we construct the list of all the indicies using 0..$#input, and then throw away the even numbers as before. Next, we need a slice of the array with just those indicated elements:

  my @output = @input[@odd_indicies];

And that's it! Of course, we can even eliminate the intermediate variable, at the expense of a bit of complexity in:

  my @output = @input[grep $_ % 2, 0..$#input];

Now, let's consider the opposite problem. We want the indicies of the elements that are odd in the array. Again, it's a matter of understanding the right indirections. Start with 0..$#input, and then see which of those results in an odd array value:

  my @indicies_of_odd = grep $input[$_] % 2, 0..$#input;

Of course, from here, it'd be a simple step to actually look at those elements:

  my @output = @input[@indicies_of_odd];

But the point of this snippet is to fetch the indicies, not the final @output, which we derived above much easier.

The expression for the grep can get pretty complex. Usually if the expression is something more complicated than a single operator, I drop down into the block-form of the grep:

  my @output = grep { $_ % 2 } @input;

Note that there's no comma between the closing brace and the list. If you add one there, Perl thinks you were trying to create an anonymous hash for the expression, which would always be true and thus rather pointless. Occasionally, Perl guesses wrong anyway about whether that's an anonymous hash or block, so you have to help it along. The simplest way I've seen is a leading plus for a hash or a slightly-trailing semicolon for a block:

  grep + { anon hash } ...
  grep { ; code block } ...

The block can be arbitrarily complex, including having local variables and arbitrary control structures. Like a subroutine, the last expression evaluated in the block is the one that matters. But unlike a subroutine, you're not permitted to use a return from the block, so choose your logic carefully.

For example, to implement the Unix rm command with the -i selective delete switch, it's merely a bit of code:

  unlink $_ or warn "Cannot delete $_: $!; continuing..."
    foreach grep {
      print "$_? ";
      <STDIN> =~ /^y/i;
    } glob "*";

Let's look at this one from the end to the beginning, because we've used a bunch of right-to-left operators. First, the glob returns a list of all of the names in the current directory that don't begin with a dot.

Next, the grep evaluates the print with $_ set to each name, and then waits for a response on STDIN. If this response begins with the letter y (case ignored), the last expression evaluated in the block is true, and thus that particular item is selected for the output. Otherwise, the item is simply discarded.

Next, the foreach takes each item of the list returned by grep, and evaluates the logical and expression with the value in $_. If the unlink succeeds, the warn is skipped. Otherwise, the message is printed, and we move on.

The code is arguably convolted with grep. You might find it easier to read in a forward form:

  foreach (glob "*") {
    print "$_? ";
    next unless <STDIN> =~ /^y/i;
    next if unlink $_;
    warn "Cannot delete $_: $!; continuing...";
  }

However, one difference between this code and the previous snippet is that this code will delete the files as you go along, instead of selecting all the files first, then deleting them in a batch.

Another common operation performed on lists is transforming them. For example, suppose the task was to insure that all the items in a list were definitely odd by multiplying them by two and adding one:

  my @output;
  foreach (@input) {
    push @output, $_ * 2 + 1;
  }

But again, this is such a common operation that there's a nice shortcut in Perl: the map operator:

  my @output = map $_ * 2 + 1, @input;

And like its grep cousin, there's also a block form:

  my @output = map { $_ * 2 + 1 } @input;

But unlike grep, the map operator's last (or only) expression is evaluated in a list context. If the result is multiple elements (or empty), this will mean the output list is longer (or shorter) than the input list:

  my @numbers_and_odds = map { $_, $_ * 2 + 1 } @input;

For each input value, we're getting back two elements in a list. The output list will be twice as big as the input list. One slightly-more-practical example is to get a list of filenames and their sizes from the current directory, cached into a hash:

  my @names_and_sizes = map { $_ , -s $_ } glob "*";
  my %sizeof = @names_and_sizes;

The intermediate array has alternating names and sizes, which is in the right shape to be assigned to the hash. Of course, we didn't need the intermediate variable there either:

  my %sizeof = map { $_ , -s $_ } glob "*";

The number of elements returned doesn't have to be constant, either. For example, the expression:

  my @fields = split " ", $_;

returns the whitespace-delimited elements of the string in $_. In fact, it appears so often that this is the default:

  my @fields = split;

If we did this with $_ set to each of the lines of a file, and then concatenated the results, we'd have a single list of all the words of the file. But this is also what map will do for us directly:

  my @all_words = map split, <INPUT>;

Each line is placed in $_, the split breaks the line into words, and the results are concatenated into one final output!

What if we wanted a (so-called) two-dimensional array of lines and words. No problem: just put an anonymous array constructor around the output of each split:

  my @two_d_words = map [split], <INPUT>;

Now $two_d_words[3] is an arrayref of the words on the fourth line (counting ``0, 1, 2, 3''), and $two_d_words[3][2] is the third word on the fourth line, if any, or undef otherwise.

Another common list operation is list-reduction: converting the list into a single scalar. A few common reductions are built in to Perl, such as joining the elements into a single string with a common glue:

  my $result = join ", ", @input;

But what if we wanted the items summed instead of being concatenated? We could hand-write the code like so:

  my $result = 0;
  $result += $_ for @input;

But it might be easier to see both the concatenation and the summation as examples of a generic reduce operator:

  sub join { my $glue = shift; reduce { "$a$glue$b" } @_ }
  sub sum { reduce { $a + $b } @_ }

This reduce operator works by placing the first and second elements of the list into the placeholders of $a and $b, and then taking the scalar result as $a and the next element as $b, until the entire list has been processed.

Although this hypothetical reduce operator isn't built in to Perl the way that grep and map are provided, there's a very fast C-coded version in the CPAN in the List::Util module. And as of Perl 5.8, List::Util is provided as part of the core distribution. Thus, we merely need to install a recent version of Perl, or the CPAN module, and we can say:

  use List::Util C<reduce>;
  my $sum = reduce { $a + $b } @input;

to get the sum of the values. (The List::Util module provides a sum function directly, but it's nice to know how simple the definition is.) We can also do more esoteric things, like compute the product of the values:

  my $factorial = reduce { $a * $b } 1..$n;

Or follow a list of values as a series of hash keys to obtain a final element:

  my $location = reduce { \($a->{$b}) } \%hash, @keys;

You could use this to drill down to an arbitrary hash location. For example, we could evalute the equivalent of:

  $info{Flintstone}{Fred}{Age} = 25;

as:

  my @keys = qw(Flintstone Fred Age);
  my $location = reduce { \($a->{$b}) } \%info, @keys;
  $$location = 25;

Thanks to Perl's autovivification, the sub-hash elements are populated automatically, resulting in a nice reference for me to de-reference when we're done with the reduce.

So, there you have it. A few simple tricks with Perl's lists, using a few built-in, and easily-accessible operators. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 58 (Apr 2004)