Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
| Download this listing! | ||
Linux Magazine Column 02 (Jul 1999)
[suggested title: Tieing up loose ends]
Perl has a lot of cool stuff. Certainly, the basic:
print "Hello, world!\n";
lets people get started without knowing much about Perl, but ``is there a way to do X in Perl?'' can almost always be answered ``Yes!''
For example, the neat way that a DBM can appear to be a hash in Perl rather transparently is done with a mechanism called ``tied variables''. But tied variables aren't limited to just DBMs -- we can make scalars, arrays, hashes, and filehandles all have similar magic.
What? Did I say filehandles? Yes. Imagine a ``magic'' filehandle that appears to the rest of the program to be a normal filehandle (albeit already opened). However, every time the program ``reads a line'' from the filehandle, a subroutine gets invoked. In fact, for every operation on this so-called filehandle, a different subroutine gets invoked. Well, that's what a tied filehandle does.
One use of having a magic filehandle is to create a filehandle that
automatically expands ``include'' specifications, where some part of the
contents indicate that other files must be consulted as well. For
example, Perl's require operator brings in additional Perl code
from other files, and the C preprocessor (CPP) looks for lines like
#include "file.h" to bring in more C code.
The advantage of having the filehandle have all the smarts is that I can re-use existing code or libraries that expect a filehandle, and yet get the include-file expansion done transparently.
While getting a filehandle to be tied may seem obscure, the process is
actually rather straightforward. I'll just create a class library
(let's call it IncludeHandle), and then create my handles with
tie rather than open. To demonstrate this, I've written a
little program called ihtest that uses the IncludeHandle class,
presented in [listing one, below].
Lines 1 and 2 turn on warnings, and enable compiler restrictions.
Line 4 pulls in the IncludeHandle module, described later. As this
is an object-oriented class, we won't be importing any functions.
Lines 6 through 14 demonstrate the first use of the tied
IncludeHandle-generated handles. I'm setting up a ``naked block''
(a block that is not otherwise part of a larger construct, like an
if or a while), so that the local on the soon-to-be tied (or
is that fit to be tied?) filehandle found in *FRED will
disappear when I'm done.
The local *FRED in line 7 creates a temporary value for all kinds
of things that share the name FRED. One of these is our
filehandle, and although the others (like $FRED and %FRED) are
also localized, that doesn't make much difference here. This temporary
value will get undone in line 14.
Lines 8 and 9 ties the filehandle FRED (indicated by passing the
symbol name *FRED to tie), using the designated parameters. The
first parameter must be a class name (a package name with certain
subroutines defined within that package). Here, I've designated the
IncludeHandle class to handle the tie. The parameters of
localfile and a quoted string looking like a C-language
preprocessor include-file directive get passed to the TIEHANDLE
method, described later. If this succeeds, the tie returns true;
otherwise, the die is executed with $! having an appropriate
brief error code.
I've defined the first parameter after the classname to be treated as a filename to open. You can think of this as if it were:
open(FRED, "localfile") or die "Cannot open: $!";
except that any include files (denoted by lines that match the second
additional parameter after the classname) will be expanded in place.
Thus, the normal-looking loop in lines 11 through 13 will dump out the
contents of this file. However, if any line of localfile matches
the pattern ^#include "(.*)", then the part returned as $1 in
that pattern will be opened as a new file, and its contents inserted
in place of the line. This is a recursive operation: included files
may themselves contain include-file lines. We'll see how this all
works when I describe the class file, later.
Lines 16 to 24 show a similar example. Note however that the include-file pattern specification is being passed as a compiled regular expression, rather than just a string. That's helpful if the tie is being executed in a loop, so that the expression doesn't have to continue to be recompiled on each iteration. Again, just showing off the versatility of this particular tie usage.
Also note here that lines 21 and 22 invoke the filehandle read-line operator in a list context instead of a scalar context. We'll see how this is supported later.
Lines 26 to 34 show a more interesting and complicated use of that
same second parameter. If the parameter is a ``coderef'' (a reference
to a named or anonymous subroutine), then the subroutine is called for
each line read from the file, with $_ set to the line. The
subroutine can return undef to indicate that the line is an
ordinary text line to be returned as part of the read operation, or
can return a string indicating a new filename to open.
Lines 28 through 30 define an anonymous subroutine that looks for two different kind of include lines -- both kinds that the C-preprocessor understands. If I wanted, I could even create a ``search path'' for names found within angle brackets, just like the C-preprocessor.
Again, note in line 33 that we're invoking the read-line operator
in a list context, here being passed directly back to print.
So, to test this, I can create a local file localfile that might
look like this:
aaa
#include "incfile"
bbb
#include <incfile>
ccc
and then another file incfile that contains this:
111
222
333
and the output will look like:
aaa
111
222
333
bbb
#include <incfile>
ccc
aaa
111
222
333
bbb
#include <incfile>
ccc
aaa
111
222
333
bbb
111
222
333
ccc
Note that the line with the angle-bracked name is processed only on the third include, because the first two used a simple regular expression that did not include the angle-bracketed form.
So, how does all this magic happen? How can the filehandles created
with tie have the smarts to automatically include other files
while the reading is taking place. Well, to understand that, let's
look at the implementation of the IncludeHandle class. That'll
be in the file named IncludeHandle.pm, presented in [listing two,
below].
Note the copyright in lines 1 and 2 -- just to make it clear that you can steal this if you want, although really it's more of a demo than a complete module. For one thing, it doesn't have embedded documentation, because you're reading the documentation right now.
Line 4 switches us into the IncludeHandle package, so that
unqualified symbols end up in the right place. Line 5 turns on
the standard compiler restrictions.
Lines 7 and 8 pull in two needed modules. First, we'll need
dynamically created filehandles, so the IO::File module takes care
of that for us. Further, some error messages must look as if they
came from the invoker rather than one of these routines, so I'll also
use the carp function for that. Both of these modules are in the
standard Perl distribution, so there's no need to pull anything from
the CPAN.
Lines 10 through 43 define the TIEHANDLE method. This method name
is built-in to the tie interface, so you can't just make up any old
name and expect it to work. The parameters are copied from the parameters
to tie (skipping over the first tie parameter).
Lines 11 through 14 name those parameters. The class name (in this
case, IncludeHandle) ends up in $class, while the requested
filename and include specification end up in $file and $code,
respectively.
Lines 16 and 17 open up the initial file. If this fails, we're not
going to get very far, so a quick undef return is enough to tell
the tie operator that things broke, and that's also going to return
an undef to invoker of tie, triggering a die in the code
presented earlier.
Lines 19 through 23 turn the include specification into a coderef if
it isn't already. First, the pattern is compiled (harmless if it was
already a compiled pattern) in line 20. If that fails, we'll return
undef (failing the tie operation), but first setting $! to
illegal seek, meaning we can't seek for include information given
that bad regular expression. Cute and twisted, but I had to pick one
of the existing errno codes, so my choices were limited.
Obviously, the docs for a production module similar to this would
describe that error and why you would get it.
Line 22 compiles an anonymous subroutine that has a closure on the
$pattern lexical variable. This subroutine does a simple pattern
match on $_, returning either $1 if successful, or undef.
Thus, after this step, $code is always a coderef fitting the
general specification described earlier.
Finally, and very essential, lines 25 to 28 return a blessed hashref,
becoming the object that sits behind the tie. As further operations
are performed on the tied filehandle, they'll be translated into
method calls on this object. Here, I'm saving the opened filehandle,
and the coderef. The opened filehandle is dropped into a
single-element anonymous array, for reasons that will become apparent
later. The object is returned from the tie operator, but also gets
stashed away for these automatic operations. You can always get it
back by calling the tied operator on the potentially tied item.
Speaking of operations on the tied filehandle, the most interesting
one for our experiment is ``reading a line'', which translates into a
READLINE method call on our hidden object, and this method is
defined as the subroutine in lines 31 through 43. Now, recall that
this operation can happen in either a scalar or list context. If it's
in a scalar context, we'll fetch one line, performed in line 41 by
calling ourselves as an instance method read_a_line. However, in
an array context, we've got to return all the lines from all the files
and their included files. The simplest way is to call the
read_a_line method repeatedly until it returns undef, while
gathering the results into an array. We'll then return that array.
This is handled in lines 35 through 39.
So, one step further, we've got the instance method read_a_line to
deal with, defined in lines 45 through 71. And here's where the
include files are expanded.
First, the instance variable Handles is stuffed into the local
variable $handles in line 48. Then, we'll use a naked block once
again to create a nice looping control structure that doesn't involve
a goto in lines 49 through 70.
Line 50 ensures that we have some handle to read from. Of course,
the first time in, it'll be the handle initially created in
TIEHANDLE, but this is actually a stack that can grow and shrink as
include files are noted or files reach their end. And if we've gotten
to the end of the last file, it's time to return undef to designate
end-of-file.
Line 51 takes the most interesting filehandle (the one that we're currently reading from), and reads a line from it in line 52, using an indirect filehandle read-line operation. (I pondered a design for a few minutes that would let this be a tied filehandle, but decided that would be too mind boggling for a simple explanation for now.)
If the line cannot be read, lines 54 and 55 remove the now-useless filehandle, and restart the logic back at line 50. That's gets us back out of the nested include files, and even lets us quit at the end of the initial file.
Lines 57 to 60 determine if we're staring at a include filename. Line
58 sets up the $_ variable, and line 59 invokes the coderef stashed
as the Code instance variable. Whatever this routine returns ends
up in $filename.
If $filename is anything but undef, we've got a valid filename
that must be included to replace this line. Line 62 attempts to open
that file, and if successful, places the newly opened filehandle at
the head of the queue (line 63). Otherwise (line 65), we'll squawk at
the user, and go get another line. Line 67 dumps us back at the input
fetching starting at line 50.
If we make it down to line 69, we've seen a good text line from somewhere, and we're ready to read it.
All files brought in with use or require must end in a true
value. To be cute, I put the string 0 but true in line 73 as this
particular file's true value, which is a self-documenting string.
Sure, this is mostly a toy example of a tied filehandle, but there's a
lot more where that came from. And you can find out more about tied
filehandles by invoking perldoc perltie. There's also examples of
tied data in some of the CPAN modules, and in the Perl Cookbook
(from O'Reilly and Associates). Until next time, have fun!
Listings
=0= LISTING ONE - ihtest
=1= #!/usr/bin/perl -w
=2= use strict;
=3=
=4= use IncludeHandle;
=5=
=6= {
=7= local *FRED;
=8= tie *FRED, 'IncludeHandle', "localfile", q/^#include "(.*)"/
=9= or die "Cannot tie: $!";
=10=
=11= while (<FRED>) {
=12= print;
=13= }
=14= }
=15=
=16= {
=17= local *BARNEY;
=18= tie *BARNEY, 'IncludeHandle', "localfile", qr/^#include "(.*)"/
=19= or die "Cannot tie: $!";
=20=
=21= my @a = <BARNEY>;
=22= print @a;
=23=
=24= }
=25=
=26= {
=27= local *DINO;
=28= tie *DINO, 'IncludeHandle', "localfile", sub {
=29= /^#include \"(.*)\"/ ? $1 : /^#include <(.*)>/ ? $1 : undef
=30= }
=31= or die "Cannot tie: $!";
=32=
=33= print <DINO>;
=34= }
=0= LISTING TWO - IncludeHandle.pm
=1= ## copyright (c) 1999 Randal L. Schwartz
=2= ## you may use this software under the same terms as Perl itself
=3=
=4= package IncludeHandle;
=5= use strict;
=6=
=7= use IO::File;
=8= use Carp qw(carp);
=9=
=10= sub TIEHANDLE {
=11= my $class = shift;
=12=
=13= my $file = shift;
=14= my $code = shift; # might be string pattern, or qr//
=15=
=16= my $handle = IO::File->new($file, "r")
=17= or return undef; # also sets $!
=18=
=19= unless ((ref $code || "") eq "CODE") {
=20= my $pattern = eval { qr/$code/ };
=21= $! = 29, return undef if $@; # bad RE
=22= $code = sub { $_ =~ $pattern ? $1 : undef };
=23= }
=24=
=25= bless {
=26= Handles => [$handle],
=27= Code => $code,
=28= }, $class;
=29= }
=30=
=31= sub READLINE {
=32= my $self = shift;
=33=
=34= if (wantarray) {
=35= my @return;
=36= while (defined(my $line = $self->read_a_line)) {
=37= push @return, $line;
=38= }
=39= @return;
=40= } else {
=41= $self->read_a_line;
=42= }
=43= }
=44=
=45= sub read_a_line {
=46= my $self = shift;
=47=
=48= my $handles = $self->{Handles};
=49= {
=50= return undef unless @$handles;
=51= my $handle = $handles->[0];
=52= my $result = <$handle>;
=53= unless (defined $result) {
=54= shift @$handles;
=55= redo;
=56= }
=57= my $filename = do {
=58= local $_ = $result;
=59= $self->{Code}->();
=60= };
=61= if (defined $filename) { # saw an include
=62= if (my $include_handle = IO::File->new($filename, "r")) {
=63= unshift @$handles, $include_handle;
=64= } else {
=65= carp "Cannot open $filename (skipping): $!";
=66= }
=67= redo;
=68= }
=69= $result;
=70= }
=71= }
=72=
=73= "0 but true";

