Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 70 (May 2007)
[Suggested title: ``Export behavior, not data'']
I spend a fair amount of time (some would say ``too much time'') hanging out in mailing lists and chat areas for Perl beginners. One of the problems that comes up frequently is what someone should do when their program gets too large, or they want to share code between programs. The question will usually be phrased as ``how do I include a file?'', because the presumption is that part of the code can simply be transported into a new file, and glued back in at the proper time. But beneath this question lurks a number of troubles that a beginning Perl programmer might not realize. Let's look at why the naive approach can sometimes be trouble down the road.
The require
function (on which the use
statement is built) is
rather simple. Given a filename (or a package name that can be turned
in to a filename), locate the file along the @INC
path, and bring
it in. If the file is successfully loaded, program execution
continues, and a notation is made in %INC
to prevent the file from
being loaded twice.
Let's say that we have a number of programs that all want to calculate
a running total, looking for the sum at the end. We could create
calculate_running_total.pl
, and put it in our @INC
:
my $total = 0;
sub add_item { $total += shift; }
sub grand_total { return $total; }
1;
The 1;
at the end is mandated by the require
interface: the last
expression evaluated in the file has to be a true value, or the
require fails.
We can pull it into our code like so:
require 'calculate_running_total.pl';
for my $file (glob '*') { add_item(-s $file); }
print "total bytes is ", grand_total(), "\n";
Here, I'm walking through a list of filenames to get their sizes, and then adding each of the sizes into the hidden total. When I'm done, I'll grab the grand total and display it.
The $total
variable here is hidden within the file. There's
nothing I can say in the main script to access that variable directly.
We'll come back to that in a moment.
One of the problems with including a file like this is that the
namespace is shared between the main program and the included file.
If calculate_running_total.pl
needed a few more subroutines to
perform the task, the subroutines in my main program might collide,
especially if they were undocumented. I could just get creative with
my names:
my $total = 0;
sub calculate_running_total_add_item { $total += shift; }
sub calculate_running_total_grand_total { return calculate_running_total_normalize($total); }
## private routines: sub calculate_running_total_normalize { ... }
1;
but this would also require a corresponding lengthening in usage:
require 'calculate_running_total.pl';
for my $file (glob '*') { calculate_running_total_add_item(-s $file); }
print "total bytes is ", calculate_running_total_grand_total(), "\n";
as well as be annoying to the author of the included file, having to always prefix every name explicitly like that.
For this reason, modern Perl programs frequently take advantage of packages:
package CalculateRunningTotal;
my $total = 0;
sub add_item { $total += shift; }
sub grand_total { return normalize($total); }
## private routines: sub normalize { ... }
1;
I can now bring this in and use names that make it more clear that I'm pulling from a separate file:
require 'calculate_running_total.pl';
for my $file (glob '*') { CalculateRunningTotal::add_item(-s $file); }
print "total bytes is ", CalculateRunningTotal::grand_total(), "\n";
A little nicer, and a little clearer. Of course, nothing stops us
from still calling CalculateRunningTotal::normalize
except perhaps
a gentleman's agreement. OK, call it ``good programming practice''.
The names are still a bit long, and by migrating this thing into a
full-blown module, we can shorten that up even more. First, we'll
bring in the Exporter
module to handle the namespace aliasing, and
we'll also have to change the name to end in .pm
so that use
knows how to find it. So in CalculateRunningTotal.pm
, we now have:
package CalculateRunningTotal; require Exporter; @EXPORT = qw(add_item grand_total);
my $total = 0;
sub add_item { $total += shift; }
sub grand_total { return normalize($total); }
## private routines: sub normalize { ... }
1;
And we'll use it with:
use CalculateRunningTotal;
for my $file (glob '*') { add_item(-s $file); }
print "total bytes is ", grand_total(), "\n";
Hey, look at that, short names again. Of course, we could avoid
colliding with our own add_item
if we had one by restricting the import:
use CalculateRunningTotal qw(grand_total);
for my $file (glob '*') { CalculateRunningTotal::add_item(-s $file); }
print "total bytes is ", grand_total(), "\n";
Now we have to spell out add_item
using its full package name.
At this point, we have the basic workings of a trivial but clean module. Time to get back to the original point. What is the mistake people often make when designing the interface? It's when people choose to export the data of the module in addition to (or instead of) the behavior.
For example I might have looked at the early version of this module, staring at the code:
sub grand_total { return $total; }
I might have asked myself why I was writing such a boring trivial subroutine,
when I could just give access to $total
directly. Since the Exporter
works only with package variables, I'd have to change this from a lexical
to a package variable (declared with our
):
package CalculateRunningTotal; require Exporter; @EXPORT = qw(add_item $total);
our $total = 0;
sub add_item { $total += shift; }
And at first glance, this looks a bit simpler:
use CalculateRunningTotal;
for my $file (glob '*') { add_item(-s $file); }
print "total bytes is $total\n";
Here, the $total
variable has been exported and aliased from the
CalculateRunningTotal
package into the main
package. (Let's
ignore for a moment that even add_item
would then be nearly
useless.)
The problem with this configuration is that I'm now commiting to
providing a specific variable with the name $total
that can be
consulted at any time to provide the running total. Before, with
grand_total
in place, I had some place to add extra hooks if I
desired later, such as the normalize
routine in some of the
previous versions. But where would I put normalize
now?
The data interface is very flexible, and very hard to upgrade or
update. Sure, I could probably replace $total
with a tied
variable, but only at some speed expense and increased difficulty of
debugging.
Also, consider that I've now increased the scope of $total
to
include many more lines of code. If in my debugging, I see that the
value of $total
is now incorrect, I'll have to look over a much
broader range of code to see what might be altering it. Some would
call this having increased the coupling between the main code and
the external package, and increased coupling almost always comes with
a debugging cost.
So, when you're designing a module, you should definitely think twice
(or three times) when you start to add variables to your @EXPORT
list. It may seem to relieve a problem initially, but it will almost
always lead to bad things later. Until next time, enjoy!