Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 96 (Aug 2007)
[suggested title: ``Always wear your utility belt (part 1)'']
One of my favorite television lines stuck in my slowly aging brain comes from the mid-60's campy Batman television series. Whenever Batman (played by Adam West: I sat next to him during a cross-country flight a few years ago and had a fun conversation) was stuck in a tight situation, he uttered the painfully halting ``must.. get.. to.. my.. utility.. belt'' phrase. Everything he needed to get out of this episode's trouble was in that belt, if somewhat magically. If he needed to repel sharks: there it was, the shark repellant. If he needed to dissolve glue: yep, there's the glue dissolver. What a magical time of television!
Perl also has its own ``utility belts'', namely Scalar::Util
and
List::Util
. These modules were added into the core around Perl
version 5.8, although you can install them from the CPAN into any
modern Perl version. Let's take a look at what our Perl utility belts
contain.
By default, neither of these modules export any subroutines, so we'll need to ask for these functions explicitly by import.
The blessed
function of Scalar::Util
tells us the classname of a
blessed reference, or undef
otherwise. For example:
use Scalar::Util qw(blessed); blessed "foo"; # undef blessed bless [], "Foo"; # "Foo" blessed bless {}, "Bar"; # "Bar"
At first glance, this seems similar to the ref
builtin function.
However, consider this:
ref []; # "ARRAY" blessed []; # undef
Yes, for an unblessed reference, ref
returns the primitive data
type (such as ARRAY
or HASH
), while blessed
returns undef
.
The dualvar
function helps us create a single value that acts like
the $!
built-in. $!
is odd in that it has one value in a
numeric context (the error number, such as 13), and a related but
different value in a string context (the error string, such as
Permission denied
). We can create a similar value using dualvar
:
use Scalar::Util qw(dualvar); my $result = dualvar(13, "Permission Denied"); if ($result == 13) { ... } # true if ($result =~ /denied/i) { ... } # also true!
For a more powerful version of this, look at Contextual::Return
in
the CPAN. This same example would be written:
use Contextual::Return; my $result = NUM { 13 } STR { "Permission Denied" };
I'll save the rest of that cool module for another time.
I've never used isvstring
from Scalar::Util
, because vstrings
are a deprecated feature, although still supported in version 5.8.
However, since I'm the originator of the JAPH, I figure I'll illustrate
this using one:
use Scalar::Util qw(isvstring); my $japh = v74.117.115.116.32.97.110.111.116.104.101.114.32.80.101.114.108.32.104.97.99.107.101.114.44; print $japh, "\n"; # prints "Just another Perl hacker,\n" if (isvstring $japh) { ... } # true
Apparently, the fact that my JAPH came from a vstring is remembered as
part of the string, and isvstring
can detect that.
Using a string as a number in Perl is well-defined: the string is converted to
a number (and cached), and the resulting number is used in the expression. An
ugly string that doesn't exactly look like a number converts as a 0, and if
warnings are enabled, we get an Argument ... isn't numeric
message.
Internally, Perl calls looks_like_number
to decide how numeric the value
might be, and we can get to that at the Perl level as well:
use Scalar::Util qw(looks_like_number); my $age; { print "How old are you? "; chomp($age = <STDIN>); print ("$age isn't a number, try again\n"), redo unless looks_like_number $age; }
The openhandle
function detects whether a reference or glob is connected
to an open filehandle:
use Scalar::Util qw(openhandle); if (openhandle(*STDIN)) { ... } # glob if (openhandle(\*STDIN)) { ... } # reference
The classic way of testing this was to use defined fileno
, as in:
if (defined fileno $somereference) { ... }
However, this breaks down for tied filehandles:
BEGIN { package Dummy; sub TIEHANDLE { bless {}, shift } } tie (*FOO, "Dummy"); if (defined fileno *FOO) { ... } # tries to call tied(*FOO)->FILENO if (openhandle *FOO) { ... } # returns true
The readonly
function detects whether a value is read-only, such
as a constant, or a variable that is aliased to a constant:
use Scalar::Util qw(readonly); readonly 3; # true readonly $x; # false, unless $x is aliased to a read-only value
An example of where this aliasing might occur is in a subroutine call:
sub is_readonly { print "$_[0] is "; print "not " unless readonly $_[0]; print "read-only\n"; } is_readonly(3); # prints 3 is read-only is_readonly(my $x = 0); # prints 0 is not read-only
I've never used the refaddr
function, but it looks like a nice way
to detect whether a scalar is a reference or not, and if so,
what the memory address might be:
use Scalar::Util qw(refaddr); refaddr "hello"; # undef refaddr []; # some numeric value
I've seen refaddr
used as a key to a hash when constructing
inside-out objects.
As yet another way to look at references, consider reftype
,
which returns the primitive type of a reference, or undef
otherwise:
use Scalar::Util qw(reftype); reftype "hello"; # undef reftype []; # "ARRAY" reftype {}; # "HASH" reftype bless [], "Foo"; # "ARRAY"
Note that this differs from the built-in ref
because ref
returns
the blessed class for objects, and can be fooled to return one of the
built-in names if you're really perverse:
ref bless [], "Foo"; # "Foo" ref bless {}, "ARRAY"; # "ARRAY" (don't do this!)
I've also never used the set_prototype
function, and subroutine
prototypes are generally discouraged, but I'll mention it here anyway
for completeness:
use Scalar::Util qw(set_prototype); my $s = sub { ... }; set_prototype $s, '$$'; # same as: $s = sub ($$) { ... };
The tainted
function determines whether a value is tainted.
When Perl is operating with taint enabled, and a value comes in from
the dangerous outside world, the value is marked as tainted, and
nearly any calculation that uses a tainted in any way also results in
a tainted value. If a tainted value is used in a dangerous way, Perl
aborts, hopefully saving you from potential harm.
use Scalar::Util qw(tainted); tainted "foo"; # false (internal value) tainted $ENV{HOME}; # true if running under -T (external value) $ENV{HOME} = "/"; tainted $ENV{HOME}; # now false
The weaken
function weakens its lvalue (scalar variable) argument so that
the reference contained within the variable is weak. A weak reference
still functions as a normal reference with respect to dereferencing, but
does not count as a reference when Perl is considering whether there are
any references to a value. Incidentally, a copy of a weak reference is not
also weak, unless you also weaken it.
Typically, weak references are used in self-referential data
structures. For example, consider some hashrefs representing nodes in
a tree, each of which has an arrayref element of kids
pointing at
the children, and a parent
element pointing back upwards. Let's
make the root, and two leaf nodes:
my $root = {}; my $leaf1 = { parent => $root }; my $leaf2 = { parent => $root };
and now let's set up the kids in the root:
push @$root{kids}, $leaf1, $leaf2;
At this point, we have a self-referential data structure. Even if these variables are all lexically local to a subroutine, the subroutine will leak memory each time it is called, because there's always at least one reference to each of three hashes. To fix this, we must weaken the parent links:
use Scalar::Util qw(weaken); my $root = {}; my $leaf1 = { parent => $root }; weaken $leaf1->{parent}; my $leaf2 = { parent => $root }; weaken $leaf2->{parent}; push @$root{kids}, $leaf1, $leaf2;
Now, we can get from the root to the kids, and from the kids to the
root, using the existing references. However, the links from the kids
to the root won't count, so Perl treats the literal $root
as the
only path to that hash. When $root
goes out of scope, any weakened
references to the hash (as in, the values for each of the parent
uplinks) are set to undef
. The refcounts of the two kids nodes are
also reduced. If $leaf1
and $leaf2
are also going out of scope,
then the corresponding hashes are also now unreferenced, causing the
entire data structure to disappear.
We can detect a weak reference using isweak
:
use Scalar::Util qw(isweak); isweak $root->{kids}[0]; # false isweak $leaf1->{parent}; # true
Note that weaken
and isweak
appear only when you install the ``XS''
version of the module.
That wraps up the Scalar::Util
-ity belt. Next month, I'll
examine List::Util
. Until then, enjoy!