Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 96 (Aug 2007)

[suggested title: ``Always wear your utility belt (part 1)'']

One of my favorite television lines stuck in my slowly aging brain comes from the mid-60's campy Batman television series. Whenever Batman (played by Adam West: I sat next to him during a cross-country flight a few years ago and had a fun conversation) was stuck in a tight situation, he uttered the painfully halting ``must.. get.. to.. my.. utility.. belt'' phrase. Everything he needed to get out of this episode's trouble was in that belt, if somewhat magically. If he needed to repel sharks: there it was, the shark repellant. If he needed to dissolve glue: yep, there's the glue dissolver. What a magical time of television!

Perl also has its own ``utility belts'', namely Scalar::Util and List::Util. These modules were added into the core around Perl version 5.8, although you can install them from the CPAN into any modern Perl version. Let's take a look at what our Perl utility belts contain.

By default, neither of these modules export any subroutines, so we'll need to ask for these functions explicitly by import.

The blessed function of Scalar::Util tells us the classname of a blessed reference, or undef otherwise. For example:

  use Scalar::Util qw(blessed);
  blessed "foo"; # undef
  blessed bless [], "Foo"; # "Foo"
  blessed bless {}, "Bar"; # "Bar"

At first glance, this seems similar to the ref builtin function. However, consider this:

  ref []; # "ARRAY"
  blessed []; # undef

Yes, for an unblessed reference, ref returns the primitive data type (such as ARRAY or HASH), while blessed returns undef.

The dualvar function helps us create a single value that acts like the $! built-in. $! is odd in that it has one value in a numeric context (the error number, such as 13), and a related but different value in a string context (the error string, such as Permission denied). We can create a similar value using dualvar:

  use Scalar::Util qw(dualvar);
  my $result = dualvar(13, "Permission Denied");
  if ($result == 13) { ... } # true
  if ($result =~ /denied/i) { ... } # also true!

For a more powerful version of this, look at Contextual::Return in the CPAN. This same example would be written:

  use Contextual::Return;
  my $result = NUM { 13 } STR { "Permission Denied" };

I'll save the rest of that cool module for another time.

I've never used isvstring from Scalar::Util, because vstrings are a deprecated feature, although still supported in version 5.8. However, since I'm the originator of the JAPH, I figure I'll illustrate this using one:

  use Scalar::Util qw(isvstring);
  my $japh = v74.117.115.116.32.97.110.111.116.104.101.114.32.80.101.114.108.32.104.97.99.107.101.114.44;
  print $japh, "\n"; # prints "Just another Perl hacker,\n"
  if (isvstring $japh) { ... } # true

Apparently, the fact that my JAPH came from a vstring is remembered as part of the string, and isvstring can detect that.

Using a string as a number in Perl is well-defined: the string is converted to a number (and cached), and the resulting number is used in the expression. An ugly string that doesn't exactly look like a number converts as a 0, and if warnings are enabled, we get an Argument ... isn't numeric message. Internally, Perl calls looks_like_number to decide how numeric the value might be, and we can get to that at the Perl level as well:

  use Scalar::Util qw(looks_like_number);
  my $age;
  {
    print "How old are you? ";
    chomp($age = <STDIN>);
    print ("$age isn't a number, try again\n"), redo
      unless looks_like_number $age;
  }

The openhandle function detects whether a reference or glob is connected to an open filehandle:

  use Scalar::Util qw(openhandle);
  if (openhandle(*STDIN)) { ... } # glob
  if (openhandle(\*STDIN)) { ... } # reference

The classic way of testing this was to use defined fileno, as in:

  if (defined fileno $somereference) { ... }

However, this breaks down for tied filehandles:

  BEGIN { package Dummy; sub TIEHANDLE { bless {}, shift } }
  tie (*FOO, "Dummy");
  if (defined fileno *FOO) { ... } # tries to call tied(*FOO)->FILENO
  if (openhandle *FOO) { ... } # returns true

The readonly function detects whether a value is read-only, such as a constant, or a variable that is aliased to a constant:

  use Scalar::Util qw(readonly);
  readonly 3; # true
  readonly $x; # false, unless $x is aliased to a read-only value

An example of where this aliasing might occur is in a subroutine call:

  sub is_readonly {
    print "$_[0] is ";
    print "not " unless readonly $_[0];
    print "read-only\n";
  }
  is_readonly(3); # prints 3 is read-only
  is_readonly(my $x = 0); # prints 0 is not read-only

I've never used the refaddr function, but it looks like a nice way to detect whether a scalar is a reference or not, and if so, what the memory address might be:

  use Scalar::Util qw(refaddr);
  refaddr "hello"; # undef
  refaddr []; # some numeric value

I've seen refaddr used as a key to a hash when constructing inside-out objects.

As yet another way to look at references, consider reftype, which returns the primitive type of a reference, or undef otherwise:

  use Scalar::Util qw(reftype);
  reftype "hello"; # undef
  reftype []; # "ARRAY"
  reftype {}; # "HASH"
  reftype bless [], "Foo"; # "ARRAY"

Note that this differs from the built-in ref because ref returns the blessed class for objects, and can be fooled to return one of the built-in names if you're really perverse:

  ref bless [], "Foo"; # "Foo"
  ref bless {}, "ARRAY"; # "ARRAY" (don't do this!)

I've also never used the set_prototype function, and subroutine prototypes are generally discouraged, but I'll mention it here anyway for completeness:

  use Scalar::Util qw(set_prototype);
  my $s = sub { ... };
  set_prototype $s, '$$';
  # same as: $s = sub ($$) { ... };

The tainted function determines whether a value is tainted. When Perl is operating with taint enabled, and a value comes in from the dangerous outside world, the value is marked as tainted, and nearly any calculation that uses a tainted in any way also results in a tainted value. If a tainted value is used in a dangerous way, Perl aborts, hopefully saving you from potential harm.

  use Scalar::Util qw(tainted);
  tainted "foo"; # false (internal value)
  tainted $ENV{HOME}; # true if running under -T (external value)
  $ENV{HOME} = "/";
  tainted $ENV{HOME}; # now false

The weaken function weakens its lvalue (scalar variable) argument so that the reference contained within the variable is weak. A weak reference still functions as a normal reference with respect to dereferencing, but does not count as a reference when Perl is considering whether there are any references to a value. Incidentally, a copy of a weak reference is not also weak, unless you also weaken it.

Typically, weak references are used in self-referential data structures. For example, consider some hashrefs representing nodes in a tree, each of which has an arrayref element of kids pointing at the children, and a parent element pointing back upwards. Let's make the root, and two leaf nodes:

  my $root = {};
  my $leaf1 = { parent => $root };
  my $leaf2 = { parent => $root };

and now let's set up the kids in the root:

  push @$root{kids}, $leaf1, $leaf2;

At this point, we have a self-referential data structure. Even if these variables are all lexically local to a subroutine, the subroutine will leak memory each time it is called, because there's always at least one reference to each of three hashes. To fix this, we must weaken the parent links:

  use Scalar::Util qw(weaken);
  my $root = {};
  my $leaf1 = { parent => $root };
  weaken $leaf1->{parent};
  my $leaf2 = { parent => $root };
  weaken $leaf2->{parent};
  push @$root{kids}, $leaf1, $leaf2;

Now, we can get from the root to the kids, and from the kids to the root, using the existing references. However, the links from the kids to the root won't count, so Perl treats the literal $root as the only path to that hash. When $root goes out of scope, any weakened references to the hash (as in, the values for each of the parent uplinks) are set to undef. The refcounts of the two kids nodes are also reduced. If $leaf1 and $leaf2 are also going out of scope, then the corresponding hashes are also now unreferenced, causing the entire data structure to disappear.

We can detect a weak reference using isweak:

  use Scalar::Util qw(isweak);
  isweak $root->{kids}[0]; # false
  isweak $leaf1->{parent}; # true

Note that weaken and isweak appear only when you install the ``XS'' version of the module.

That wraps up the Scalar::Util-ity belt. Next month, I'll examine List::Util. Until then, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.