Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 91 (Mar 2007)
[suggested title: ``Sanity-checking your Perl code'']
I've seen a lot of Perl code in my nearly-two-decades experience in the Perl community. It would be an understatement to say ``the quality has varied from program to program''. In fact, because the Perl motto is ``there's more than one way to do it'', I find that the same exact steps can be expressed in Perl from the annoyingly verbose to the obfuscatedly compact representation.
We can think of a program in two ways. First, a program is a set of
instructions to the /usr/bin/perl
compiler-interpreter, to be
executed to (hopefully) perform a desired operation. But second, a
program is also a document, to be read by the next programmer to come
along (whom we like to call the maintenance programmer, although
it's typically just you a few months later), interpreted as its own
pseudocode to describe the flow of the operations.
Let's focus on this latter role for a bit. What can we know about a Perl program without executing it? In just its textual form, what can the program tell us? Or more precisely, what can other programs tell us about this textual thing that resembles a program?
Probably the simplest thing we can tell is ``is it valid?''. For this,
we invoke perl
itself, passing the compile-only switch:
perl -c ourprogram
For this operation, perl
compiles the program, but stops just short
of the execution phase. This means that every part of the program
text is translated into the internal data structure that represents
the working program, but we haven't actually executed any code. If
there are any syntax errors, we're informed, and the compilation
aborts.
Actually, that's a bit of a lie. Thanks to BEGIN
blocks (including
their layered-on cousin, the use
directive), some Perl code may
have been executed during this theoretically safe ``syntax check''.
For example, if your code contains:
BEGIN { warn "Hello, world!\n" }
then you will see that message, even during perl -c
! This is
somewhat surprising to people who consider ``compile only'' to mean
``executes no code''. Consider the code that contains:
BEGIN { system "rm", "-rf", "/" }
and you'll see the problem with that argument. Oops.
The next thing we can find out from perl
itself is ``is it avoiding
obvious beginner mistakes?''. To do that, we turn on warnings with
an additional -w
switch. We can combine this with -c
to form
the -cw
switch:
perl -cw ourprogram
When we compile code with warnings turned on, we are informed of things that beginners often stumble over, like having a variable name that is slightly mispelled in one location:
$bammbamm = 3; $bammbamm += 5; $bamm_bamm *= 2; # oops print "the value is $bammbamm\n";
In the third line, we have made a mistake, and in fact, perl -cw
tells us so:
Name "main::bamm_bamm" used only once: possible typo at myscript line 3.
Here, the variable $bamm_bamm
is used only once, so that's likely a
typo. Of course, if I made a mistake in both the second and third
lines, I'll get no warning at all. The real solution here is to
use strict
, which would have aborted at even the first line.
Another less frequent (but equally tragic) mistake is giving the same name to subroutines as built-ins. For example:
sub log { warn scalar localtime, ": ", @_ } # ... later in the code ... log "something's wrong";
Here, the intent with the log
subroutine is to add a timestamp
to the warning message. However, when we run this through perl -cw
,
we see how it's actually been interpreted:
Ambiguous call resolved as CORE::log(), qualify as such or use & at logger line 3. Argument "something's wrong" isn't numeric in log at logger line 3. Useless use of log in void context at logger line 3.
Wow. This is definitely unexpected, but we can see what has happened
now. Instead of our log
subroutine, Perl has preferred the
interpretation of the third line as if it were the log
built-in
operation (taking the natural logarithm of the argument). That's
clearly wrong, but we can see two other things that are being checked
as well: we've passed a non-numeric argument to this built-in log
function, and it really wants to be used on a numeric value. In
addition, we're also using this value-returning function in a void
context, and Perl thinks that's probably not going to be very useful.
As you see, we can get a lot of information from this simple
invocation. I recommend using perl -cw
frequently: at least on
every dozen lines of code added to your program. Keep in mind that
we'll only be seeing common beginner mistakes. It won't catch every
mistake you make, nor will everything it reports be a mistake. For
example, if you want an array containing some common punctuation:
my @separators = qw(. , ; :);
you'll see that perl -cw
wrongfully tags this line with:
Possible attempt to separate words with commas at myprog line 1.
Of course, this is not what's happening... but this warning is to try to prevent something like:
my @words = qw(this, that, those);
which is clearly wrong. So, don't be religious about requiring every last warning to be gone, but you should at least be able to explain why they are happening instead of just ignoring them.
Another place where things tend to mysteriously fall apart (even for experts!)
is the relative precedence of operators in a complex expression. For example,
the already obtuse ?:
operator can appear on the left side of an
assignment. This means that:
$selector ? $x : $y = 3;
will set either $x
or $y
to 3, depending on the truth of $selector
.
But this can be a bit confusing. Consider this use of the ?:
operator as
a tiny if-then-else statement (already a bad idea, but let's run with it):
$direction ? $x += 1 : $y += 1;
At first glance (viewing this program as a document, or running it with
inadequate testing), we might think that if $direction
is true, we'll
be incrementing $x
, and otherwise incrementing $y
. However, Perl
instead treats this as if we had written:
(($direction ? ($x += 1) : $y) += 1);
With this additional hint, we can see that if $direction
is false,
$y
is incremented by 1. However, if $direction
is true, $x
is incremented not once, but twice! Oops.
Wouldn't it be nice if Perl could show us this latter form directly?
After all, Perl has to parse that expression. And the answer is yes,
we can get Perl to show us this code, using the B::Deparse
module.
This module comes from the collection of things in the B::*
namespace
that are all about rummaging around in Perl's internals to discover
or manipulate compiled Perl programs.
Although it's a bit confusing, we access the B::Deparse
module
using the O
module. On the command line, we say:
perl -MO=Deparse,-p somescript
to mean: use B::Deparse, passing it -p
as an argument, and
invoke it on the remaining text. In this case, we see that -p
means ``add extra parentheses'', but that's exactly what we want. It
even works on ``one liners'':
perl -MO=Deparse,-p -e '$direction ? $x += 1 : $y += 1'
That's how I got the parenthesized version earlier. So if you're ever wondering how something is parsed, keep an alias to this command around (either in your head or as a literal alias).
Speaking of B::*
modules, another module that is quite useful at
understanding and verifying a large program is the B::Xref
module,
which can generate a cross-reference listing of every subroutine
definition and invocation, as well as every package and lexical
variable used. For example, we can invoke it directly on a one-liner with:
perl -MO=Xref -e '$direction ? $x += 1 : $y += 1'
And what we're told is that we have:
-e syntax OK File -e Subroutine (definitions) Package Internals &HvREHASH s0 &SvREADONLY s0 &SvREFCNT s0 &hash_seed s0 &hv_clear_placeholders s0 &rehash_seed s0 Package PerlIO &get_layers s0 Package Regexp &DESTROY s0 Package UNIVERSAL &VERSION s0 &can s0 &isa s0 Subroutine (main) Package main $direction 1 $x 1 $y 1
Wow. For a one-liner, we sure have a lot of stuff. The most
important part is that final bit, where we find that we've used
$main::direction
as well as $main::x
and $main::y
. While
B::Xref
can be fooled by fancy code, it's still quite nice to get a
general ``who uses what'' report for most normal code.
No modern mention of ``Perl as document'' can avoid mentioning the
incredibly ambititious project started by Adam Kennedy (mentioned in
last month's column) to create a framework that can ``parse'' Perl
(PPI
). I put ``parse'' in quotes here, because it's impossible to
properly parse Perl code in the general case. But for most code, most
of the time, some assumptions can be made, and this enables Adam and
others to perform some basic static analysis of Perl code.
After Adam contributed his PPI
code to the CPAN, Jeffrey Thalhammer
used the routines to ``parse Perl'' and come up with metrics and
indicators about the quality of the code. For the most part, Jeffrey
used the recommendations suggested by Damian Conway in Perl Best
Practices (O'Reilly Media, July 2005), and came up with a way to test
more than a quarter of the 256 practices using objective analysis
aided by the PPI
parsing. The result is rather nice, because I can
apply the rules directly to a program to see how much Damian would
grimace at the result.
For example, let's feed some code to perlcritic
, the command-line
tool installed by the module. We'll start with code that successfully
copies the ``message of the day'' file to standard out:
open A, "/etc/motd"; while ($x = <A>) { print $x; }
In my early days, I wrote code just like that. Let's see what perlcritic
(and thus Perl Best Practices) has to say about that. At the quietest
level of complaint (severity 5 only), we get:
Two-argument "open" used at line 1, column 3. See page 207 of PBP. (Severity: 5) Code before strictures are enabled at line 1, column 3. See page 429 of PBP. (Severity: 5) Bareword file handle opened at line 1, column 3. See pages 202,204 of PBP. (Severity: 5)
Wow. Three problems already, in a four-line program. First, we're told that a two-arg open is broken. What PBP suggests (on the given page numbers) is:
open A, "<", "/etc/motd";
The second problem is that I forgot use strict
. That's easy enough
to fix. Of course, my $x
variable needs declaration too.
The third error says that I shouldn't have used A
in the first
place. The package-based ``bareword'' filehandles have been around
since Perl version 1, and have their limitations because of the
syntax involved. Modern Perl programmers can avoid them using standard
lexical variables that are autovivified to contain a filehandle:
open my $handle, "<", "/etc/motd";
So, using these three hints, let's take a second stab at the program:
use strict; open my $handle, "<", "/etc/motd"; while (my $x = <$handle>) { print $x; }
Following these rules, we get a clean run. But let's invoke
perlcritic
with a bit more annoyance, using perlcritic -4
(level 4 instead of 5, slightly more annoying):
Code not contained in explicit package at line 1, column 3. Violates encapsulation. (Severity: 4) Code before warnings are enabled at line 2, column 3. See page 431 of PBP. (Severity: 4) Code not contained in explicit package at line 2, column 3. Violates encapsulation. (Severity: 4) Module does not end with "1;" at line 3, column 3. Must end with a recognizable true value. (Severity: 4) Code not contained in explicit package at line 3, column 3. Violates encapsulation. (Severity: 4) Code not contained in explicit package at line 3, column 10. Violates encapsulation. (Severity: 4) Code not contained in explicit package at line 4, column 5. Violates encapsulation. (Severity: 4)
Wow. Lots of errors. Oddly enough, some of them should be ignored, like
the ones that are complaining that my script is not a module. Once
we throw those away, we see that, indeed, we've forgotten use warnings
as well at the beginning of the code. We can fix that, and repeat.
Also note that the lack of error checking on the open
wasn't revealed
even at level 4. Oh well.
As you can see, there are plenty of things you can do with your Perl program besides executing it! I hope this has inspired you to try a few of these tips out. Until next time, enjoy!