Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 91 (Mar 2007)

[suggested title: ``Sanity-checking your Perl code'']

I've seen a lot of Perl code in my nearly-two-decades experience in the Perl community. It would be an understatement to say ``the quality has varied from program to program''. In fact, because the Perl motto is ``there's more than one way to do it'', I find that the same exact steps can be expressed in Perl from the annoyingly verbose to the obfuscatedly compact representation.

We can think of a program in two ways. First, a program is a set of instructions to the /usr/bin/perl compiler-interpreter, to be executed to (hopefully) perform a desired operation. But second, a program is also a document, to be read by the next programmer to come along (whom we like to call the maintenance programmer, although it's typically just you a few months later), interpreted as its own pseudocode to describe the flow of the operations.

Let's focus on this latter role for a bit. What can we know about a Perl program without executing it? In just its textual form, what can the program tell us? Or more precisely, what can other programs tell us about this textual thing that resembles a program?

Probably the simplest thing we can tell is ``is it valid?''. For this, we invoke perl itself, passing the compile-only switch:

  perl -c ourprogram

For this operation, perl compiles the program, but stops just short of the execution phase. This means that every part of the program text is translated into the internal data structure that represents the working program, but we haven't actually executed any code. If there are any syntax errors, we're informed, and the compilation aborts.

Actually, that's a bit of a lie. Thanks to BEGIN blocks (including their layered-on cousin, the use directive), some Perl code may have been executed during this theoretically safe ``syntax check''. For example, if your code contains:

  BEGIN { warn "Hello, world!\n" }

then you will see that message, even during perl -c! This is somewhat surprising to people who consider ``compile only'' to mean ``executes no code''. Consider the code that contains:

  BEGIN { system "rm", "-rf", "/" }

and you'll see the problem with that argument. Oops.

The next thing we can find out from perl itself is ``is it avoiding obvious beginner mistakes?''. To do that, we turn on warnings with an additional -w switch. We can combine this with -c to form the -cw switch:

  perl -cw ourprogram

When we compile code with warnings turned on, we are informed of things that beginners often stumble over, like having a variable name that is slightly mispelled in one location:

  $bammbamm = 3;
  $bammbamm += 5;
  $bamm_bamm *= 2; # oops
  print "the value is $bammbamm\n";

In the third line, we have made a mistake, and in fact, perl -cw tells us so:

  Name "main::bamm_bamm" used only once: possible typo at myscript line 3.

Here, the variable $bamm_bamm is used only once, so that's likely a typo. Of course, if I made a mistake in both the second and third lines, I'll get no warning at all. The real solution here is to use strict, which would have aborted at even the first line.

Another less frequent (but equally tragic) mistake is giving the same name to subroutines as built-ins. For example:

  sub log { warn scalar localtime, ": ", @_ }
  # ... later in the code ...
  log "something's wrong";

Here, the intent with the log subroutine is to add a timestamp to the warning message. However, when we run this through perl -cw, we see how it's actually been interpreted:

  Ambiguous call resolved as CORE::log(), qualify as such or use & at logger line 3.
  Argument "something's wrong" isn't numeric in log at logger line 3.
  Useless use of log in void context at logger line 3.

Wow. This is definitely unexpected, but we can see what has happened now. Instead of our log subroutine, Perl has preferred the interpretation of the third line as if it were the log built-in operation (taking the natural logarithm of the argument). That's clearly wrong, but we can see two other things that are being checked as well: we've passed a non-numeric argument to this built-in log function, and it really wants to be used on a numeric value. In addition, we're also using this value-returning function in a void context, and Perl thinks that's probably not going to be very useful.

As you see, we can get a lot of information from this simple invocation. I recommend using perl -cw frequently: at least on every dozen lines of code added to your program. Keep in mind that we'll only be seeing common beginner mistakes. It won't catch every mistake you make, nor will everything it reports be a mistake. For example, if you want an array containing some common punctuation:

  my @separators = qw(. , ; :);

you'll see that perl -cw wrongfully tags this line with:

  Possible attempt to separate words with commas at myprog line 1.

Of course, this is not what's happening... but this warning is to try to prevent something like:

  my @words = qw(this, that, those);

which is clearly wrong. So, don't be religious about requiring every last warning to be gone, but you should at least be able to explain why they are happening instead of just ignoring them.

Another place where things tend to mysteriously fall apart (even for experts!) is the relative precedence of operators in a complex expression. For example, the already obtuse ?: operator can appear on the left side of an assignment. This means that:

  $selector ? $x : $y = 3;

will set either $x or $y to 3, depending on the truth of $selector. But this can be a bit confusing. Consider this use of the ?: operator as a tiny if-then-else statement (already a bad idea, but let's run with it):

  $direction ? $x += 1 : $y += 1;

At first glance (viewing this program as a document, or running it with inadequate testing), we might think that if $direction is true, we'll be incrementing $x, and otherwise incrementing $y. However, Perl instead treats this as if we had written:

  (($direction ? ($x += 1) : $y) += 1);

With this additional hint, we can see that if $direction is false, $y is incremented by 1. However, if $direction is true, $x is incremented not once, but twice! Oops.

Wouldn't it be nice if Perl could show us this latter form directly? After all, Perl has to parse that expression. And the answer is yes, we can get Perl to show us this code, using the B::Deparse module. This module comes from the collection of things in the B::* namespace that are all about rummaging around in Perl's internals to discover or manipulate compiled Perl programs.

Although it's a bit confusing, we access the B::Deparse module using the O module. On the command line, we say:

  perl -MO=Deparse,-p somescript

to mean: use B::Deparse, passing it -p as an argument, and invoke it on the remaining text. In this case, we see that -p means ``add extra parentheses'', but that's exactly what we want. It even works on ``one liners'':

  perl -MO=Deparse,-p -e '$direction ? $x += 1 : $y += 1'

That's how I got the parenthesized version earlier. So if you're ever wondering how something is parsed, keep an alias to this command around (either in your head or as a literal alias).

Speaking of B::* modules, another module that is quite useful at understanding and verifying a large program is the B::Xref module, which can generate a cross-reference listing of every subroutine definition and invocation, as well as every package and lexical variable used. For example, we can invoke it directly on a one-liner with:

  perl -MO=Xref -e '$direction ? $x += 1 : $y += 1'

And what we're told is that we have:

  -e syntax OK
  File -e
    Subroutine (definitions)
      Package Internals
        &HvREHASH         s0
        &SvREADONLY       s0
        &SvREFCNT         s0
        &hash_seed        s0
        &hv_clear_placeholders  s0
        &rehash_seed      s0
      Package PerlIO
        &get_layers       s0
      Package Regexp
        &DESTROY          s0
      Package UNIVERSAL
        &VERSION          s0
        &can              s0
        &isa              s0
    Subroutine (main)
      Package main
        $direction        1
        $x                1
        $y                1

Wow. For a one-liner, we sure have a lot of stuff. The most important part is that final bit, where we find that we've used $main::direction as well as $main::x and $main::y. While B::Xref can be fooled by fancy code, it's still quite nice to get a general ``who uses what'' report for most normal code.

No modern mention of ``Perl as document'' can avoid mentioning the incredibly ambititious project started by Adam Kennedy (mentioned in last month's column) to create a framework that can ``parse'' Perl (PPI). I put ``parse'' in quotes here, because it's impossible to properly parse Perl code in the general case. But for most code, most of the time, some assumptions can be made, and this enables Adam and others to perform some basic static analysis of Perl code.

After Adam contributed his PPI code to the CPAN, Jeffrey Thalhammer used the routines to ``parse Perl'' and come up with metrics and indicators about the quality of the code. For the most part, Jeffrey used the recommendations suggested by Damian Conway in Perl Best Practices (O'Reilly Media, July 2005), and came up with a way to test more than a quarter of the 256 practices using objective analysis aided by the PPI parsing. The result is rather nice, because I can apply the rules directly to a program to see how much Damian would grimace at the result.

For example, let's feed some code to perlcritic, the command-line tool installed by the module. We'll start with code that successfully copies the ``message of the day'' file to standard out:

  open A, "/etc/motd";
  while ($x = <A>) {
    print $x;
  }

In my early days, I wrote code just like that. Let's see what perlcritic (and thus Perl Best Practices) has to say about that. At the quietest level of complaint (severity 5 only), we get:

  Two-argument "open" used at line 1, column 3.  See page 207 of PBP.  (Severity: 5)
  Code before strictures are enabled at line 1, column 3.  See page 429 of PBP.  (Severity: 5)
  Bareword file handle opened at line 1, column 3.  See pages 202,204 of PBP.  (Severity: 5)

Wow. Three problems already, in a four-line program. First, we're told that a two-arg open is broken. What PBP suggests (on the given page numbers) is:

  open A, "<", "/etc/motd";

The second problem is that I forgot use strict. That's easy enough to fix. Of course, my $x variable needs declaration too.

The third error says that I shouldn't have used A in the first place. The package-based ``bareword'' filehandles have been around since Perl version 1, and have their limitations because of the syntax involved. Modern Perl programmers can avoid them using standard lexical variables that are autovivified to contain a filehandle:

  open my $handle, "<", "/etc/motd";

So, using these three hints, let's take a second stab at the program:

  use strict;
  open my $handle, "<", "/etc/motd";
  while (my $x = <$handle>) {
    print $x;
  }

Following these rules, we get a clean run. But let's invoke perlcritic with a bit more annoyance, using perlcritic -4 (level 4 instead of 5, slightly more annoying):

  Code not contained in explicit package at line 1, column 3.  Violates encapsulation.  (Severity: 4)
  Code before warnings are enabled at line 2, column 3.  See page 431 of PBP.  (Severity: 4)
  Code not contained in explicit package at line 2, column 3.  Violates encapsulation.  (Severity: 4)
  Module does not end with "1;" at line 3, column 3.  Must end with a recognizable true value.  (Severity: 4)
  Code not contained in explicit package at line 3, column 3.  Violates encapsulation.  (Severity: 4)
  Code not contained in explicit package at line 3, column 10.  Violates encapsulation.  (Severity: 4)
  Code not contained in explicit package at line 4, column 5.  Violates encapsulation.  (Severity: 4)

Wow. Lots of errors. Oddly enough, some of them should be ignored, like the ones that are complaining that my script is not a module. Once we throw those away, we see that, indeed, we've forgotten use warnings as well at the beginning of the code. We can fix that, and repeat. Also note that the lack of error checking on the open wasn't revealed even at level 4. Oh well.

As you can see, there are plenty of things you can do with your Perl program besides executing it! I hope this has inspired you to try a few of these tips out. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 91 (Mar 2007)