Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 39 (Aug 2001)

[suggested title: ``Developing a Perl routine'']

I was cruising the Perl newsgroups the other day, and found the following request. It appears to be a homework problem from a university assignment, so I won't embarass the original poster by including their name. (Normally, I try to give credit to the source of inspiration for one of my columns, so if you want your name in lights, please email your ideas to me!)

Here's the task: start with the 3-letter English abbreviations for the seven days of the week, in their natural order. Write a subroutine that takes two of these weekday abbreviations, and returns a single comma-separated string with all the days in between, wrapping around if necessary. For example, given an input of Tue and Thu, the return value is Tue,Wed,Thu. However, an input of Thu and Tue should wrap all the way around to Thu,Fri,Sat,Sun,Mon,Tue. Be sure to reject (via die) any erroneous inputs.

This doesn't sound like that difficult a task, but there's some interesting subtleties as I was starting to solve it in my head, so I'm writing this column effectively in real time, as I would consider each piece, to illustrate effective practices at developing Perl routines.

First, I need a subroutine name. This is sometimes harder than it looks. I want a name that's short enough that I'll reuse it, but long enough to be descriptive and unique. Let's start with day_string_range. So, our initial subroutine looks like:

  sub day_string_range {
    ... code goes here ...
  }

Good so far. Hopefully, that wasn't too suprising. Next, I need to grab the start and end values, so let's first check that they are there, and if so, grab them:

  sub day_string_range {
    die "too few args: @_" if @_ < 2;
    die "too many args: @_" if @_ > 2;
    my ($start,$end) = @_;
    ...
  }

Note here the use of @_ in a scalar context (with the two comparison operators) to check the number of elements. And then we create two local variables to hold the arguments if we make it past the test.

But $start is now something like Thu. How do we turn that into a computable value so that we can increment it? Well, we'll need a map to map it back and forth to integers. Let's use an array (for now) to hold the names in the proper order:

  my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);

But where do we put this? If we put it inside the subroutine, it'll get initialized every time, at a slight speed penalty. However, if we put it outside the subroutine, it needs to get executed before the initialization occurs. Fortunately, we can create ``static local'' variables using a common Perl idiom:

  BEGIN {
    my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
    sub day_string_range {
      die "too few args: @_" if @_ < 2;
      die "too many args: @_" if @_ > 2;
      my ($start,$end) = @_;
      ...
    }
  }

The BEGIN block causes the code to be executed at compile time, initializing the value of @day_names before any other ``normal'' code is executed. And the variable is local to the block, so it won't be seen by any other part of the program: just the subroutine inside the BEGIN block.

Now, using this array, how do we turn a name into a number? An array isn't very good for searching, except as a linear search. A linear search might look like:

  my $number_for_start;
  for (0..$#day_names) {
    if ($day_names[$_] eq $start) {
      $number_for_start = $_;
      last;
    }
  }
  die "$start is not a day name"
    unless defined $number_for_start;

And this would probably suffice if we called this routine only a few times in the program. But let's step up the efficiency a bit (and simplify the logic) by using a hash. First, we'll convert the array trivially into a hash with keys of the original array names and values equal to the position within the array:

    my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
    my %mapping; @mapping{@day_names} = 0..$#day_names;

Now we have $day_names[3] as Wed, and $mapping{Wed} as 3, so we can go from one to the other. For symmetry, we could have made these both hashes, but the differences in the resulting code would be minor.

So, now how do we get from $start to $number_for_start? Much simpler:

  die "No such name: $start" unless exists $mapping{$start};
  my $number_for_start = $mapping{$start};

That's a very pure way to do it. We can be slightly dirtier and optimize knowing that there are no undef values in the hash:

  defined (my $number_for_start = $mapping{$start})
    or die "no such name: $start";

And this is all fine and well for $start, but we need to perform the same operation for $end. I could cut-n-paste that code twice, making the subroutine so far as:

  BEGIN {
    my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
    my %mapping; @mapping{@day_names} = 0..$#day_names;
    sub day_string_range {
      die "too few args: @_" if @_ < 2;
      die "too many args: @_" if @_ > 2;
      my ($start,$end) = @_;
      defined (my $number_for_start = $mapping{$start})
        or die "no such name: $start";
      defined (my $number_for_end = $mapping{$end})
        or die "no such name: $end";
      ...
    }
  }

But my ``maintenance alarm'' goes off when I type such code. I've got the same code twice in the program, but merely operating on different variables. If for some reason I were maintaining this code (to add functionality or additional error checking, for example), and I missed the fact that the two pieces of code must stay in parallel, I'd probably spend a lot of time debugging. Or worse yet, the code would go into production to show errors in live data.

I can solve this with a bit of indirection: if I see it properly as a ``mapping'' from set of values to another, the wonderful map operator pops into mind:

      my ($number_for_start, $number_for_end) = map {
        defined (my $ret = $mapping{$_})
          or die "no such name: $_";
        $ret;
      } $start, $end;

Here, each value is placed into $_, and then we run the code in the block. The last expression evaluated (in this case, $ret) provides the elements of the output list.

But as I'm staring at this, I realize that once I've got the value of $number_for_start, I'll never need the original $start value again. So, another approach (again shooting for more simplification) is to use the ``in-place-ness'' of the foreach loop:

      foreach ($start, $end) {
        exists $mapping{$_}
          or die "no such name: $_";
        $_ = $mapping{$_};
      }

Yes, that's more like it. For each of the start and end values, if there exists a mapping for it, replace the value with its mapped equivalent, otherwise die.

At this point, we've got two small validated integers in the range of 0 to 6. It's time to start building the return value. We'll build it up as a list, then join the list with commas to get a single string.

      my @return;
      while (1) {
        push @return, $day_names[$start];
        last if $start == $end;
        ($start += 1) %= 7;
      }

Starting with the empty list, we'll push the day name for $start onto the end of the list. If this is the end, we also quit. Otherwise, we'll increment the value of $start, but in a ``modulus 7'' way, wrapping around from 6 back to 0. I really wanted to write ++$start %= 7, but that's sadly not permitted. This loop has stuff before the exit test, and stuff after the exit test, which is easiest to write as an ``infinite'' loop with an exit test in the middle.

And now for the final return value: a simple join on the list thus created:

      return join ",", @return;

And to put that all together:

  BEGIN {
    my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
    my %mapping; @mapping{@day_names} = 0..$#day_names;
    sub day_string_range {
      die "too few args: @_" if @_ < 2;
      die "too many args: @_" if @_ > 2;
      my ($start,$end) = @_;
      foreach ($start, $end) {
        exists $mapping{$_}
          or die "no such name: $_";
        $_ = $mapping{$_};
      }
      my @return;
      while (1) {
        push @return, $day_names[$start];
        last if $start == $end;
        ($start += 1) %= 7;
      }
      return join ",", @return;
    }
  }

Wow. Simple, but there's a lot of steps to get it done right. Sure, you can probably play ``Perl golf'' and ``minimize the [key]strokes'' to write this routine in about half the number of lines. But I think we've got enough here for the maintenance programmer to follow along or modify nicely, and it does the job with reasonable efficiency. Until next time, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.