Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 97 (Sep 2007)

[suggested title: ``Always wear your utility belt (part 2)'']

Last month, I introduced the Scalar::Util super hero of the Scalar/List-Util dynamic duo, describing how a somewhat-overlooked standard library can simplify some of your common tasks. In this month's column, I'll examine List::Util for the help it can provide to your list tasks. I'll also look at List::MoreUtils for some additional common list operations, if you don't mind a quick CPAN install. (And you'll need to install List::Util from the CPAN anyway if you're running something prior to Perl 5.8.)

Like Scalar::Util, the List::Util module doesn't export any subroutines by default. That means that you'll need to ask for each of these routines explicitly with use.

First, let's look at (the appropriately titled) first. Let's say you have a list of items, and you want to find the first one that is greater than ten characters. Simply pull out first, like this:

  use List::Util qw(first);
  my $big_enough = first { length > 10 } @the_list;

The first routine walks through the list similar to grep or map, placing each item into $_. The block is then evaluated, looking for a true or false value. If true, the corresponding value of $_ is returned immediately. If every evaluation of the block returns false, then first returns undef.

Note that this is similar to:

  my ($big_enough) = grep { length $_ > 10 } @the_list;

However, the first routine avoids testing the remainder of the list once we have found our item of choice. For short lists, we might not care, but for long lists, this can save us some time if we expect a true value somewhat early in the list.

We do lose a tiny bit of information with first as well. If undef is a significant return value, we can't tell the undef as one of the list members from the undef returned at the end of the list. For example, if we wanted the ``first undef'' from a list:

  my $first_undef = first { not defined $_ } @items;

we couldn't tell if this was returning a ``found'' undef, or a ``not found'' signal (also undef). In the grep equivalent, we can see whether there are zero or non-zero elements assigned:

  if (my ($first_undef) = grep { not defined $_ } @items) {
    # really found an undef
  } else {
    # no undef found
  }

Admittedly, I can't recall where I've ever cared that much. But it's an interesting thing to think about when designing return values from functions. But enough on first. Let's move on.

The next easy utility to describe from List::Util is shuffle. Yes, many programs need a randomly ordered list of values, and here we have it as a simple word:

  use List::Util qw(shuffle);
  my @deck = shuffle
    map { "C$_", "D$_", "H$_", "S$_" }
      0..9, qw(A K Q J);

Now our deck of cards is shuffled, and rather fairly and quickly. Like sorting, shuffling is one of those things that looks rather easy to implement, but turns out to have tricky parts to get right. And in the normal List::Util installation, this is implemented at the C level (using XS), so it's quite fast.

One of my favorite ``obscure but cool once you understand it'' functions in list-processing languages is reduce, and although Perl doesn't have it is as a built-in, we can at least get to it with List::Util.

Similar to sort, reduce takes a block argument that references $a and $b. This is best illustrated by example:

  use List::Util qw(reduce);
  my $total = reduce { $a + $b } 1, 2, 4, 8, 16;

For the first evaluation of the block, $a and $b take on the first and second elements of the list: 1 and 2 in this case. The block is evaluated (returning 3), and this value is placed back into $a, and the next value is placed in $b (4). Once again, the block is evaluated (7), and the result placed in $a, and a new $b comes from the list. When there are no more items in the list, the result is returned instead. The effect is if we had written:

  my $total = ((((1 + 2) + 4) + 8) + 16);

but scaled for however many elements are in the list. Nice!

We can use it to compute a factorial for $n:

  my $factorial_n = reduce { $a * $b } 1..$n;

Or recognize a series of binary digits as a number:

  my $number = reduce { 2 * $a + $b } 1, 1, 0, 0, 1; # 0b11001

We could even rewrite join in terms of reduce:

  sub my_join {
    my $glue = shift;
    return reduce { $a . $glue . $b } @_;
  }

By adding some smarts into the block, we can find the numeric maximum of a list of values:

  my $numeric_max = reduce { $a > $b ? $a : $b } @inputs;

This works because we select the winner of any given pair of values, and if we keep carrying that winner forward, eventually the winningest winner comes out the end.

For a string maximum (``z'' preferred to ``a''), just change the type of the comparison:

  my $numeric_max = reduce { $a gt $b ? $a : $b } @inputs;

And for minimums, we can change the order of the comparison, or swap the selection of $a and $b.

For convenience, List::Util provides max, maxstr, min, minstr, and sum directly.

I learned Smalltalk long before I learned Perl, and got quite fond of the inject:into: method for collections. The reduce routine maps rather nicely, if I think of Smalltalk's:

  aCollection inject: firstValue into: [:a :b | "something with a and b"]

as Perl's:

  reduce { "something with $a and $b" } $firstValue, @aCollection;

In other words, another way of looking at reduce is that it transforms that first element into the final result by invoking the block in a specific way on all of the remaining elements of the list. So, you could put a list of elements inside an array ref with:

  my $array_ref = reduce { push @$a, $b; $a } [], @some_list;

Or create a hash with:

  my $hash_ref = reduce { $a->{$b} = 1; $a } {}, @some_list;

Note that on each iteration, $a is used, and also returned to become the new $a or the final result. This is reminiscent of the many uses of inject:into: in the Smalltalk images I've seen.

That wraps up List::Util, but I've still got a few inches of room here, so let's take a quick look at the CPAN module List::MoreUtils. Although it isn't part of the core, it's referenced in List::Util, because the module provides a few handy shortcuts implemented (again) in C code for speed. Like List::Util all imports must be specifically requested.

The any routine returns a boolean result if any of the items in the list meet the given criterion, using a $_ proxy similar to grep or map:

  use List::MoreUtils qw(any);
  my $has_some_defined = any { defined $_ } @some_list;

This is done efficiently, returning a true value as soon as the block returns a true value, and iterating to the end of the list only if none of the elements meet the condition.

Similarly, all computes whether any of the elements fail to meet the condition, returning false as soon as one of the elements fails, rather than iterating through the entire list:

  use List::MoreUtils qw(all);
  my $has_no_undef = all { defined $_ } @some_list;

Note that you could easily define any in terms of all and vice-versa, just by negating both the condition and the result value. (These items are far more efficient than their same-named ``equivalents'' in Quantum::Superpositions.)

If you negate only the result values (or just the condition, depending on how you look at it), you get two other routines defined by List::MoreUtils, none and notall:

  use List::MoreUtils qw(none notall);
  my $has_no_defined = none { defined $_ } @some_list;
  my $has_some_undef = notall { defined $_ } @some_list;

Like if vs unless or while vs until, having complementary routines gives you the flexibility to spell out what you're actually looking for, rather than requiring Perl (and the maintenance programmer) to figure out what you mean with a bunch of not operations.

If you're just counting true and false values, true and false are at your service:

  use List::MoreUtils qw(true false);
  my $bigger_than_10_count = true { $_ > 10 } @some_list;
  my $not_bigger_than_10_count = false { $_ > 10 } @some_list;

Again, these are complementary, so use the one that reads better for your task.

The first_index and last_index routines return where an item appears. For example, suppose I want to know which item is the first item that is bigger than 10:

  use List::MoreUtils qw(first_index);
  my $where = first_index { $_ > 10 } 1, 2, 4, 8, 16, 32;

The result here is 4, indicating that 16 is the first item greater than 10. The index value is 0-based. If the item is not found, -1 is returned, like Perl's built-in index search for strings. last_index works like rindex, working from the upper end of the list rather than the lower end.

A more general version of this is indexes (not indices as you might think), which returns all of the index values instead of just the first or last:

  use List::MoreUtils qw(indexes);
  my @where = indexes { $_ > 10 } 1, 2, 4, 8, 16, 32;

The result is 4, 5, showing that elements 4 and 5 of the input list match the condition.

The apply routine is like the built-in map, but automatically localizes the $_ value so we can safely change it within the block:

  use List::MoreUtils qw(apply);
  my @no_leading_blanks = apply { s/^\s+// } @input;

If we tried to do this with map:

  my @no_leading_blanks = map { s/^\s+// } @input;

then we'd see two problems. First, the result of a substitution is not the new string, but the success value, so the outputs would simply be a series of true and false values. Second, the $_ value is aliased to the inputs, so @input would have been changed. Oops. The equivalent to the apply with map would be something like:

  my @output = map { local $_ = $_; [apply action here]; $_ } @input;

And yes, the many times I've written map blocks that look just like that, I could have replaced them with apply

And List::MoreUtils contains a few more routines as well, but I've now run out of space. I hope you find this little trip into the ``utility belts'' of Perl fun and handy. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 97 (Sep 2007)