Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 70 (May 2007)

[Suggested title: ``Export behavior, not data'']

I spend a fair amount of time (some would say ``too much time'') hanging out in mailing lists and chat areas for Perl beginners. One of the problems that comes up frequently is what someone should do when their program gets too large, or they want to share code between programs. The question will usually be phrased as ``how do I include a file?'', because the presumption is that part of the code can simply be transported into a new file, and glued back in at the proper time. But beneath this question lurks a number of troubles that a beginning Perl programmer might not realize. Let's look at why the naive approach can sometimes be trouble down the road.

The require function (on which the use statement is built) is rather simple. Given a filename (or a package name that can be turned in to a filename), locate the file along the @INC path, and bring it in. If the file is successfully loaded, program execution continues, and a notation is made in %INC to prevent the file from being loaded twice.

Let's say that we have a number of programs that all want to calculate a running total, looking for the sum at the end. We could create calculate_running_total.pl, and put it in our @INC:

  my $total = 0;

  sub add_item {
    $total += shift;
  }

  sub grand_total {
    return $total;
  }

1;

The 1; at the end is mandated by the require interface: the last expression evaluated in the file has to be a true value, or the require fails.

We can pull it into our code like so:

  require 'calculate_running_total.pl';

  for my $file (glob '*') {
    add_item(-s $file);
  }

  print "total bytes is ", grand_total(), "\n";

Here, I'm walking through a list of filenames to get their sizes, and then adding each of the sizes into the hidden total. When I'm done, I'll grab the grand total and display it.

The $total variable here is hidden within the file. There's nothing I can say in the main script to access that variable directly. We'll come back to that in a moment.

One of the problems with including a file like this is that the namespace is shared between the main program and the included file. If calculate_running_total.pl needed a few more subroutines to perform the task, the subroutines in my main program might collide, especially if they were undocumented. I could just get creative with my names:

  my $total = 0;

  sub calculate_running_total_add_item {
    $total += shift;
  }

  sub calculate_running_total_grand_total {
    return calculate_running_total_normalize($total);
  }

  ## private routines:
  sub calculate_running_total_normalize {
    ... 
  }

1;

but this would also require a corresponding lengthening in usage:

  require 'calculate_running_total.pl';

  for my $file (glob '*') {
    calculate_running_total_add_item(-s $file);
  }

  print "total bytes is ",
    calculate_running_total_grand_total(), "\n";

as well as be annoying to the author of the included file, having to always prefix every name explicitly like that.

For this reason, modern Perl programs frequently take advantage of packages:

  package CalculateRunningTotal;

  my $total = 0;

  sub add_item {
    $total += shift;
  }

  sub grand_total {
    return normalize($total);
  }

  ## private routines:
  sub normalize {
    ... 
  }

1;

I can now bring this in and use names that make it more clear that I'm pulling from a separate file:

  require 'calculate_running_total.pl';

  for my $file (glob '*') {
    CalculateRunningTotal::add_item(-s $file);
  }

  print "total bytes is ",
    CalculateRunningTotal::grand_total(), "\n";

A little nicer, and a little clearer. Of course, nothing stops us from still calling CalculateRunningTotal::normalize except perhaps a gentleman's agreement. OK, call it ``good programming practice''.

The names are still a bit long, and by migrating this thing into a full-blown module, we can shorten that up even more. First, we'll bring in the Exporter module to handle the namespace aliasing, and we'll also have to change the name to end in .pm so that use knows how to find it. So in CalculateRunningTotal.pm, we now have:

  package CalculateRunningTotal;
  require Exporter;
  @EXPORT = qw(add_item grand_total);

  my $total = 0;

  sub add_item {
    $total += shift;
  }

  sub grand_total {
    return normalize($total);
  }

  ## private routines:
  sub normalize {
    ... 
  }

1;

And we'll use it with:

  use CalculateRunningTotal;

  for my $file (glob '*') {
    add_item(-s $file);
  }

  print "total bytes is ", grand_total(), "\n";

Hey, look at that, short names again. Of course, we could avoid colliding with our own add_item if we had one by restricting the import:

  use CalculateRunningTotal qw(grand_total);

  for my $file (glob '*') {
    CalculateRunningTotal::add_item(-s $file);
  }

  print "total bytes is ", grand_total(), "\n";

Now we have to spell out add_item using its full package name.

At this point, we have the basic workings of a trivial but clean module. Time to get back to the original point. What is the mistake people often make when designing the interface? It's when people choose to export the data of the module in addition to (or instead of) the behavior.

For example I might have looked at the early version of this module, staring at the code:

  sub grand_total {
    return $total;
  }

I might have asked myself why I was writing such a boring trivial subroutine, when I could just give access to $total directly. Since the Exporter works only with package variables, I'd have to change this from a lexical to a package variable (declared with our):

  package CalculateRunningTotal;
  require Exporter;
  @EXPORT = qw(add_item $total);

  our $total = 0;

  sub add_item {
    $total += shift;
  }

And at first glance, this looks a bit simpler:

  use CalculateRunningTotal;

  for my $file (glob '*') {
    add_item(-s $file);
  }

  print "total bytes is $total\n";

Here, the $total variable has been exported and aliased from the CalculateRunningTotal package into the main package. (Let's ignore for a moment that even add_item would then be nearly useless.)

The problem with this configuration is that I'm now commiting to providing a specific variable with the name $total that can be consulted at any time to provide the running total. Before, with grand_total in place, I had some place to add extra hooks if I desired later, such as the normalize routine in some of the previous versions. But where would I put normalize now?

The data interface is very flexible, and very hard to upgrade or update. Sure, I could probably replace $total with a tied variable, but only at some speed expense and increased difficulty of debugging.

Also, consider that I've now increased the scope of $total to include many more lines of code. If in my debugging, I see that the value of $total is now incorrect, I'll have to look over a much broader range of code to see what might be altering it. Some would call this having increased the coupling between the main code and the external package, and increased coupling almost always comes with a debugging cost.

So, when you're designing a module, you should definitely think twice (or three times) when you start to add variables to your @EXPORT list. It may seem to relieve a problem initially, but it will almost always lead to bad things later. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 70 (May 2007)