Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 46 (May 2003)
Programs push data around. In Perl, this data lives in variables, and the variables can be associated with various scopes. Let's take a look at Perl's peculiar scoping rules.
First, let's define a term, lexical scope. A lexical scope provides the boundaries of some property of the program associated with the text of the program itself, as opposed to properties that are associated with the runtime state of the program. Lexical properties might include variable declarations, compiler directives, exceptions being caught, and so on.
In Perl, the largest lexical scope is the source file itself. Lexically scoped items never affect anything larger than a file. In addition, nearly all blocks also introduce a nested lexical scope that ends where the block ends. Because blocks are nested and not overlapping, the lexical scopes also nest. This will become clearer in the examples that follow.
Some of the variables in a Perl program are package variables (also called symbol-table variables). A package variable's full name consists of a package prefix followed by the specific identifier for the variable. The prefix is separated from the identifier by a double colon.
For example, in $Animal::count
, Animal
is the package prefix,
while count
is the variable within the package. Both the package
and the identifier contain one or more alphanumerics and/or
underscores. Additionally, packages can have multiple, double-colon
separated parts, as in $Animal::Dog::count
. Again, count
is the
variable, and Animal::Dog
is the package prefix. There's no
necessary relationship between Animal
and Animal::Dog
, although
people tend to give related names to related packages.
Although package variables are formally named with colons, you won't
see many colons in most uses of package variables. That's because
by default variable name without colons are automatically placed into
the current package. The initial current package is package
main
, so the following two code snippets are identical:
print "What is your name? "; chomp($input = <STDIN>); $length = length $input; print "Your name $input is $length characters long.\n";
and
print "What is your name? "; chomp($main::input = <STDIN>); $main::length = length $main::input; print "Your name $main::input is $main::length characters long.\n";
It's a good thing we don't have to have main
all over the place.
So, why do we have packages, if everything already defaults to package
main
? Well, it's so that we can have multiple portions of code
brought together into one program. Suppose the code above were to be
added into a program that already had a meaning for $main::input
or
$main::length
. We'd have a collision of names. But we can fix that
by using a different package prefix:
print "What is your name? "; chomp($Query::input = <STDIN>); $Query::length = length $Query::input; print "Your name $Query::input is $Query::length characters long.\n";
Now $Query::input
has nothing to do with $main::input
, so we no
longer have a naming collision. Of course, this is a lot of typing,
and we can shorten this by changing the current package, using the
package
directive:
package Query; print "What is your name? "; chomp($input = <STDIN>); $length = length $input; print "Your name $input is $length characters long.\n";
Wow, that's easier to type, and yet the $input
variable there is
really $Query::input
, and won't conflict with $main::input
used
elsewhere.
The package directive is lexically scoped (thought we forgot about that term, eh?). This means that the package directive stays in effect until the end of the current scope, or until another package directive changes the current package again. For example, we could put that piece of code into the middle of the rest of our program as:
# initial package main ... $input = "Hey"; # $main::input
package Query; # now in package Query print "What is your name? "; chomp($input = <STDIN>); # $Query::input $length = length $input; print "Your name $input is $length characters long.\n";
package main; # back to package main
print $input; # $main::input again print "that length was $Query::length\n"; # reference prior value
However, we have to remember to reset the package back to what it was before. This is error-prone, and perhaps not easy to maintain, especially if we're not sure what the prior package might be. But, since the package directive is lexically scoped, we can introduce a block to limit the directive's influence:
# initial package main ... $input = "Hey"; # $main::input
{ # start scope package Query; # now in package Query print "What is your name? "; chomp($input = <STDIN>); # $Query::input $length = length $input; print "Your name $input is $length characters long.\n"; } # end scope
# automatically back to package main
print $input; # $main::input again print "that length was $Query::length\n"; # reference prior value
Ahh, that's a bit simpler.
As that last example showed, we can access any package variable from any location in our program, much as we can spell out the full path to any accessible file in a Unix filesystem regardless of our current directory, even though the files at or below the current directory are easier to type. But these global variables can lead to global headaches, since we can't really know at a glance about all the code that can examine or modify the variable.
Like most modern programming languages, Perl also includes the notion of a lexical variable. Lexical variables do not belong to a package, so they are not able to be referenced outside the lexical scope in which they are declared. Their names also cannot contain colons, because they do not have a package prefix.
Lexical variables are introduced with the my
keyword:
print "What is your name? "; chomp(my $input = <STDIN>); # lexical $input my $length = length $input; # lexical $length print "Your name $input is $length characters long.\n";
Because these variables are introduced outside any block in this
example, they are lexically scoped to the file in which they appear.
If this code is part of a file being included with eval
, do
,
require
, or use
, there's no chance that this $input
will
conflict with any other use of $input
. There's also no syntax that
would let any other code outside of this code access those variables,
so we can be assured that our variables won't be changing
mysteriously.
Besides file-scoped lexical variables, another common appearance is in the block that belongs to a subroutine:
sub get_name_length { print "What is your name? "; chomp(my $input = <STDIN>); # lexical $input my $length = length $input; # lexical $length print "Your name $input is $length characters long.\n"; }
When the subroutine returns, the lexical variables are discarded,
automatically recycling the memory that had been used. In addition,
any outer declaration of $input
or $length
is temporarily
shadowed within the subroutine, protecting the outer variables from
accidental alteration.
We can also create temporary variables this way:
{ # start temporary scope print "What is your name? "; chomp(my $input = <STDIN>); # lexical $input my $length = length $input; # lexical $length print "Your name $input is $length characters long.\n"; } # end temporary scope
The variables declared and used in this block will be recycled at the end of the block, just as if we had placed this code into a subroutine.
A frequent admonition in the Perl literature is ``Always use strict!''.
What does this do, precisely? Well, amongst other things, use
strict
disables the automatic prepending of the package to a variable
name. Once use strict
is in effect, a name without colons must
have been declared, either as a lexical variable, or as a specially
noted package variable.
The primary purpose of use strict
is to catch any random erroneous
variations of a variable name:
print "What is your name? "; chomp($input = <STDIN>); # $main::input my $length = length $input; # $main::length print "Your name $input is $lenth characters long.\n"; # broken
Oops! that's $main::lenth
, not $main::length
. But by turning
on use strict
, we no longer get main::
in front of anything we mention,
and thus we must declare the variables lexically at first use instead:
use strict;
print "What is your name? "; chomp(my $input = <STDIN>); # lexical $input my $length = length $input; # lexical $length print "Your name $input is $lenth characters long.\n"; # caught
The compiler will abort at that last line, because we can't
just turn $lenth
into $main::lenth
any more.
To refer to package variables, we can simply use the full prefix-included colon name:
use strict;
print "$Animal::Dog::count dogs were seen!\n"; print "$main::length characters in that name.\n";
If we want to refer to a package variable without the package prefix,
we can use the use vars
compiler directive:
use strict;
use vars qw($length); # now permits $length to mean $main::length
print "$length characters\n"; # $main::length
Any name in the use vars
list can be referenced in the current
package as if it were fully specified. Once seen, the directive is in
effect for that variable name as long as the current package is the
same as the package in which the use vars
appeared. So, this is an
error:
use strict; use vars qw($length); # $length is $main::length in main
{ package Query; print $length; # COMPILE ERROR... $Query::length not permitted }
print $length; # would have been ok, back to $main::length
In recent versions of Perl, the our
keyword was introduced as a
parallel to my
. It functions similarly to use vars
, but the
declaration of the package variable is lexically scoped, not dependent
on the current package.
use strict; our $length; # $length is $main::length in this scope
{ package Query;
our $input; # $input is $Query::input in this scope
print $length; # permitted access to $main::length here print $input; # permitted access to $Query::input here
} # end of scope, so $input goes out of scope
print $length; # still $main::length print $input; # COMPILE ERROR, no access to $main::input permitted
As you can see, use vars
and our
are not precisely the same
thing, but in general, they both serve to permit selected package
variables to be used without colons.
I hope this brief overview of package and lexical variables and scoping has been useful. Until next time, enjoy!