Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 54 (Sep 2004)
[suggested title: ``Strictly speaking about use strict'']
In many of my writings about Perl, I give the strong admonition to
place use strict
at the beginning of the program. I've often
explained the line with a few short phrases, but I thought it would be
interesting to focus on this one construct in detail for a change.
The use strict
line is a pragma. The purpose of a pragma is to
regionally or globally alter the way the language is translated for
execution. For the strict
pragma, we get three subfeatures enabled
or disabled within a particular program scope. The scope extends
to the end of the curly-brace-delimited block in which the pragma
appears, or to the end of the file if otherwise outside all blocks.
Inner pragma controls override outer controls, so we can get as
specific as we need to process a particular chunk of code.
The use strict
pragma has three aspects: vars
, subs
, and
refs
. Each aspect may be enabled or disabled individually by
explicit name, but most often, all three are enabled at once with a
simple use strict
. For example, we can enable all three aspects
initially, and disable just the vars
aspect for a portion of the
code, like so:
use strict; # all enabled ... sub marine { no strict 'vars'; # disable vars ... } # all enabled again
The vars
aspect is probably the most useful of the three aspects,
and is the one most likely to give trouble to a beginner. Scalar,
array, and hash variables are mapped into package and lexical
variables using one of five methods. The vars
aspect disables one
of these methods, leaving the remaining four enabled.
For example, the variable $bammbamm
might be referring to a lexical
variable named $bammbamm
, introduced earlier in the same scope
through the use of the my
declaration, as in:
my $bammbamm = 5; ... print $bammbamm; # lexical $bammbamm in scope
Or, it might be a package variable declared earlier by use vars
in
the same package, such as:
package This::One; use vars qw($bammbamm); ... print $bammbamm; # same as $This::One::bammbamm ... package That::One; # $bammbamm no longer legal here
The variable name might also be declared through the our
declaration, which associates a simple name with a package variable in
the current package for the remainder of the scope. For example:
package This::One; sub nominal { our $bammbamm; # $bammbamm is $This::One::bammbamm ... package That::One; print $bammbamm; # still prints $This::One::bammbamm } # $bammbamm is no longer permitted
Or, if the name contains a package delimiter (double colon), it's an explicit use of a package variable.
package This::One;
print $This::One::bammbamm; # always permitted
Finally, the variable $bammbamm
may be just a package variable in
the current package, if no prior declaration exists.
package This::One; print $bammbamm; # $This::One::bammbamm; package That::One; print $bammbamm; # $That::One::bammbamm;
It is this particular method that is disabled by use strict
,
because it can lead to the most errors in larger programs. By
default, any mention of any simple scalar, array, or hash name is
simply accepted as a package variable in that package, even if the
name is a typo!
By enabling use strict 'vars'
, the troublesome automatic acceptance
of any variable name is prevented, forcing you to declare your
variables through one of the other methods. This isn't all that
important on a five-line program, but I have rarely seen any program
stay at only five lines unless it was a one-off task.
The subs
aspect of use strict
disables the interpretation of
``bare words'' as text strings. By default, a Perl identifier (a
sequence of letters, digits, and underscores, not starting with a
digit unless it is completely numeric) that is not otherwise a built-in
keyword or previously seen subroutine definition is treated as a quoted
text string:
@daynames = (sun, mon, tue, wed, thu, fri, sat);
However, this is considered to be a dangerous practice, because obscure bugs may result:
@monthnames = (jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec);
Can you spot the bug? Yes, the 10th entry is not the string 'oct'
,
but rather an invocation of the built-in oct()
function, returning
the numeric equivalent of the default $_
treated as an octal
number. And if you wrote this program in April, you might not even
notice it breaks for six months. I'm not saying that this has
happened to anyone I know, because I believe I'm protected from
self-incrimination.
Although the problem arises mostly from collisions with built-in words,
simply watching for built-ins is insufficient. Suppose we added a
sun
function earlier in the same scope:
sub sun { ... }
Now our first dayname is also messed up, being a call to the
subroutine instead of the three-character string. But it's not
sufficient to simply scan in the source text for a same-named
subroutine. The name can also be imported from other code by one of
the earlier use
directives!
So, the proper method out of this madness is to avoid the use
of barewords in most circumstances. This list of day names can
be created easily with qw()
instead:
my @daynames = qw(sun mon tue wed thu fri sat);
And now there's no possibility of conflict, because we're using a
quoted string instead of a bareword. The nifty part is that use
strict 'subs'
(included as part of use strict
) takes care of
enforcing this automatically. Once enabled, barewords will be flagged
while the program is being parsed, before execution even begins.
Note that barewords are still permitted in a few specific locations. For example, the key to a hash can always be specified as a bareword:
my $age = $data{age}; # same as $data{"age"}
Also, the left side of a ``fat arrow'' is also automatically quoted if it resembles a bareword:
my %data = (age => 19); # same as ("age", 19)
These two automatic quotings make working with hashes with program-significant keys easier, presuming the keys you choose are all barewords.
Finally, a predeclared subroutine can be treated as a subroutine call, even if the definition of the subroutine had not yet been seen:
sub deeper; # declaration ... my $result = deeper;
I don't recommend this practice, since it is just as easy (and clearer) to follow the subroutine call with empty parens:
my $result = deeper(); # no declaration needed
The final aspect of the use strict
pragma is the disabling of
soft references (or symbolic references). A normal reference
(sometimes called a hard reference to distinguish them from soft
references) comes from an explicit referencing operation:
my $ref = \@foo; # now $ref is a reference to @foo
or from one of the anonymous reference constructors:
my $ref2 = [3, 4, 5]; # array reference created
An autovivification will also create a hard reference:
my $ref3; # variable is undef initially $ref3->[5] = 10; # $ref3 is now an array reference
Following this reference using a dereferencing operation gets us back to the original data:
print $ref2->[2]; # prints 5, from the anon array
However, the dereferencing operation can also be performed against a simple scalar string:
my $sref = "happy"; $sref->[3] = "hello"; # symbolic reference
This dereferencing is performed at execution time. Perl looks up the
value to be dereferenced, notes that it is not a hard reference, and
then examins the package variable symbol table for a same-named
variable. Because package variables spring into existence as needed,
nearly any name in $sref
will be considered legal, causing new
variables to be created dynamically.
As if that wasn't already scary enough, the variable name does not need to be a standard Perl identifier. Any string will do:
my $sref = "A [variable] {name} !normally! *illegal*"; $$sref = 12;
We now have a scalar package variable in the current package with a very crazy name.
Because of the likelihood of an accidental symbolic dereference
operation, the use strict 'refs'
aspect is recommended for every
program that uses references.
If all three of these restrictions are good, why are they not enabled
by default? The answer is ``backward compatibility''. Perl version 4
(last updated over a decade ago) permitted casual variable naming (and
didn't have any option for lexically declared variables), didn't have
the convenient qw()
for defining lists of short values, and used
soft references for indirect subroutine invocation. Thus, adding
use strict
by default would have broken nearly every Perl version 4
program!
But Perl4 is now long dead. Be sure to use strict
in your modern
Perl5 programs, and you'll get a guaranteed reduction in development
time, or double your money back! Until next time, enjoy!