Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 39 (Aug 2001)
[suggested title: ``Developing a Perl routine'']
I was cruising the Perl newsgroups the other day, and found the following request. It appears to be a homework problem from a university assignment, so I won't embarass the original poster by including their name. (Normally, I try to give credit to the source of inspiration for one of my columns, so if you want your name in lights, please email your ideas to me!)
Here's the task: start with the 3-letter English abbreviations for the
seven days of the week, in their natural order. Write a subroutine
that takes two of these weekday abbreviations, and returns a single
comma-separated string with all the days in between, wrapping around
if necessary. For example, given an input of Tue
and Thu
, the
return value is Tue,Wed,Thu
. However, an input of Thu
and
Tue
should wrap all the way around to Thu,Fri,Sat,Sun,Mon,Tue
.
Be sure to reject (via die
) any erroneous inputs.
This doesn't sound like that difficult a task, but there's some interesting subtleties as I was starting to solve it in my head, so I'm writing this column effectively in real time, as I would consider each piece, to illustrate effective practices at developing Perl routines.
First, I need a subroutine name. This is sometimes harder than it
looks. I want a name that's short enough that I'll reuse it, but long
enough to be descriptive and unique. Let's start with
day_string_range
. So, our initial subroutine looks like:
sub day_string_range { ... code goes here ... }
Good so far. Hopefully, that wasn't too suprising. Next, I need to grab the start and end values, so let's first check that they are there, and if so, grab them:
sub day_string_range { die "too few args: @_" if @_ < 2; die "too many args: @_" if @_ > 2; my ($start,$end) = @_; ... }
Note here the use of @_
in a scalar context (with the two
comparison operators) to check the number of elements. And then we
create two local variables to hold the arguments if we make it past
the test.
But $start
is now something like Thu
. How do we turn that into
a computable value so that we can increment it? Well, we'll need a map
to map it back and forth to integers. Let's use an array (for now)
to hold the names in the proper order:
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
But where do we put this? If we put it inside the subroutine, it'll get initialized every time, at a slight speed penalty. However, if we put it outside the subroutine, it needs to get executed before the initialization occurs. Fortunately, we can create ``static local'' variables using a common Perl idiom:
BEGIN { my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat);
sub day_string_range { die "too few args: @_" if @_ < 2; die "too many args: @_" if @_ > 2; my ($start,$end) = @_; ... } }
The BEGIN
block causes the code to be executed at compile time,
initializing the value of @day_names
before any other ``normal'' code
is executed. And the variable is local to the block, so it won't be
seen by any other part of the program: just the subroutine inside the
BEGIN
block.
Now, using this array, how do we turn a name into a number? An array isn't very good for searching, except as a linear search. A linear search might look like:
my $number_for_start; for (0..$#day_names) { if ($day_names[$_] eq $start) { $number_for_start = $_; last; } } die "$start is not a day name" unless defined $number_for_start;
And this would probably suffice if we called this routine only a few times in the program. But let's step up the efficiency a bit (and simplify the logic) by using a hash. First, we'll convert the array trivially into a hash with keys of the original array names and values equal to the position within the array:
my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat); my %mapping; @mapping{@day_names} = 0..$#day_names;
Now we have $day_names[3]
as Wed
, and $mapping{Wed}
as 3
,
so we can go from one to the other. For symmetry, we could have
made these both hashes, but the differences in the resulting code
would be minor.
So, now how do we get from $start
to $number_for_start
?
Much simpler:
die "No such name: $start" unless exists $mapping{$start}; my $number_for_start = $mapping{$start};
That's a very pure way to do it. We can be slightly dirtier
and optimize knowing that there are no undef
values in the hash:
defined (my $number_for_start = $mapping{$start}) or die "no such name: $start";
And this is all fine and well for $start
, but we need to perform
the same operation for $end
. I could cut-n-paste that code twice,
making the subroutine so far as:
BEGIN { my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat); my %mapping; @mapping{@day_names} = 0..$#day_names;
sub day_string_range { die "too few args: @_" if @_ < 2; die "too many args: @_" if @_ > 2; my ($start,$end) = @_; defined (my $number_for_start = $mapping{$start}) or die "no such name: $start"; defined (my $number_for_end = $mapping{$end}) or die "no such name: $end"; ... } }
But my ``maintenance alarm'' goes off when I type such code. I've got the same code twice in the program, but merely operating on different variables. If for some reason I were maintaining this code (to add functionality or additional error checking, for example), and I missed the fact that the two pieces of code must stay in parallel, I'd probably spend a lot of time debugging. Or worse yet, the code would go into production to show errors in live data.
I can solve this with a bit of indirection: if I see it properly as a
``mapping'' from set of values to another, the wonderful map
operator
pops into mind:
my ($number_for_start, $number_for_end) = map { defined (my $ret = $mapping{$_}) or die "no such name: $_"; $ret; } $start, $end;
Here, each value is placed into $_
, and then we run the code in the
block. The last expression evaluated (in this case, $ret
) provides
the elements of the output list.
But as I'm staring at this, I realize that once I've got the value of
$number_for_start
, I'll never need the original $start
value
again. So, another approach (again shooting for more simplification) is
to use the ``in-place-ness'' of the foreach
loop:
foreach ($start, $end) { exists $mapping{$_} or die "no such name: $_"; $_ = $mapping{$_}; }
Yes, that's more like it. For each of the start and end values, if there exists a mapping for it, replace the value with its mapped equivalent, otherwise die.
At this point, we've got two small validated integers in the range of 0 to 6. It's time to start building the return value. We'll build it up as a list, then join the list with commas to get a single string.
my @return; while (1) { push @return, $day_names[$start]; last if $start == $end; ($start += 1) %= 7; }
Starting with the empty list, we'll push the day name for $start
onto the end of the list. If this is the end, we also quit.
Otherwise, we'll increment the value of $start
, but in a ``modulus
7'' way, wrapping around from 6 back to 0. I really wanted to write
++$start %= 7
, but that's sadly not permitted. This loop has stuff
before the exit test, and stuff after the exit test, which is easiest
to write as an ``infinite'' loop with an exit test in the middle.
And now for the final return value: a simple join on the list thus created:
return join ",", @return;
And to put that all together:
BEGIN { my @day_names = qw(Sun Mon Tue Wed Thu Fri Sat); my %mapping; @mapping{@day_names} = 0..$#day_names;
sub day_string_range { die "too few args: @_" if @_ < 2; die "too many args: @_" if @_ > 2; my ($start,$end) = @_; foreach ($start, $end) { exists $mapping{$_} or die "no such name: $_"; $_ = $mapping{$_}; } my @return; while (1) { push @return, $day_names[$start]; last if $start == $end; ($start += 1) %= 7; } return join ",", @return; } }
Wow. Simple, but there's a lot of steps to get it done right. Sure, you can probably play ``Perl golf'' and ``minimize the [key]strokes'' to write this routine in about half the number of lines. But I think we've got enough here for the maintenance programmer to follow along or modify nicely, and it does the job with reasonable efficiency. Until next time, enjoy!