Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 02 (Jul 1999)

[suggested title: Tieing up loose ends]

Perl has a lot of cool stuff. Certainly, the basic:

        print "Hello, world!\n";

lets people get started without knowing much about Perl, but ``is there a way to do X in Perl?'' can almost always be answered ``Yes!''

For example, the neat way that a DBM can appear to be a hash in Perl rather transparently is done with a mechanism called ``tied variables''. But tied variables aren't limited to just DBMs -- we can make scalars, arrays, hashes, and filehandles all have similar magic.

What? Did I say filehandles? Yes. Imagine a ``magic'' filehandle that appears to the rest of the program to be a normal filehandle (albeit already opened). However, every time the program ``reads a line'' from the filehandle, a subroutine gets invoked. In fact, for every operation on this so-called filehandle, a different subroutine gets invoked. Well, that's what a tied filehandle does.

One use of having a magic filehandle is to create a filehandle that automatically expands ``include'' specifications, where some part of the contents indicate that other files must be consulted as well. For example, Perl's require operator brings in additional Perl code from other files, and the C preprocessor (CPP) looks for lines like #include "file.h" to bring in more C code.

The advantage of having the filehandle have all the smarts is that I can re-use existing code or libraries that expect a filehandle, and yet get the include-file expansion done transparently.

While getting a filehandle to be tied may seem obscure, the process is actually rather straightforward. I'll just create a class library (let's call it IncludeHandle), and then create my handles with tie rather than open. To demonstrate this, I've written a little program called ihtest that uses the IncludeHandle class, presented in [listing one, below].

Lines 1 and 2 turn on warnings, and enable compiler restrictions.

Line 4 pulls in the IncludeHandle module, described later. As this is an object-oriented class, we won't be importing any functions.

Lines 6 through 14 demonstrate the first use of the tied IncludeHandle-generated handles. I'm setting up a ``naked block'' (a block that is not otherwise part of a larger construct, like an if or a while), so that the local on the soon-to-be tied (or is that fit to be tied?) filehandle found in *FRED will disappear when I'm done.

The local *FRED in line 7 creates a temporary value for all kinds of things that share the name FRED. One of these is our filehandle, and although the others (like $FRED and %FRED) are also localized, that doesn't make much difference here. This temporary value will get undone in line 14.

Lines 8 and 9 ties the filehandle FRED (indicated by passing the symbol name *FRED to tie), using the designated parameters. The first parameter must be a class name (a package name with certain subroutines defined within that package). Here, I've designated the IncludeHandle class to handle the tie. The parameters of localfile and a quoted string looking like a C-language preprocessor include-file directive get passed to the TIEHANDLE method, described later. If this succeeds, the tie returns true; otherwise, the die is executed with $! having an appropriate brief error code.

I've defined the first parameter after the classname to be treated as a filename to open. You can think of this as if it were:

        open(FRED, "localfile") or die "Cannot open: $!";

except that any include files (denoted by lines that match the second additional parameter after the classname) will be expanded in place. Thus, the normal-looking loop in lines 11 through 13 will dump out the contents of this file. However, if any line of localfile matches the pattern ^#include "(.*)", then the part returned as $1 in that pattern will be opened as a new file, and its contents inserted in place of the line. This is a recursive operation: included files may themselves contain include-file lines. We'll see how this all works when I describe the class file, later.

Lines 16 to 24 show a similar example. Note however that the include-file pattern specification is being passed as a compiled regular expression, rather than just a string. That's helpful if the tie is being executed in a loop, so that the expression doesn't have to continue to be recompiled on each iteration. Again, just showing off the versatility of this particular tie usage.

Also note here that lines 21 and 22 invoke the filehandle read-line operator in a list context instead of a scalar context. We'll see how this is supported later.

Lines 26 to 34 show a more interesting and complicated use of that same second parameter. If the parameter is a ``coderef'' (a reference to a named or anonymous subroutine), then the subroutine is called for each line read from the file, with $_ set to the line. The subroutine can return undef to indicate that the line is an ordinary text line to be returned as part of the read operation, or can return a string indicating a new filename to open.

Lines 28 through 30 define an anonymous subroutine that looks for two different kind of include lines -- both kinds that the C-preprocessor understands. If I wanted, I could even create a ``search path'' for names found within angle brackets, just like the C-preprocessor.

Again, note in line 33 that we're invoking the read-line operator in a list context, here being passed directly back to print.

So, to test this, I can create a local file localfile that might look like this:

        aaa
        #include "incfile"
        bbb
        #include <incfile>
        ccc

and then another file incfile that contains this:

        111
        222
        333

and the output will look like:

        aaa
        111
        222
        333
        bbb
        #include <incfile>
        ccc
        aaa
        111
        222
        333
        bbb
        #include <incfile>
        ccc
        aaa
        111
        222
        333
        bbb
        111
        222
        333
        ccc

Note that the line with the angle-bracked name is processed only on the third include, because the first two used a simple regular expression that did not include the angle-bracketed form.

So, how does all this magic happen? How can the filehandles created with tie have the smarts to automatically include other files while the reading is taking place. Well, to understand that, let's look at the implementation of the IncludeHandle class. That'll be in the file named IncludeHandle.pm, presented in [listing two, below].

Note the copyright in lines 1 and 2 -- just to make it clear that you can steal this if you want, although really it's more of a demo than a complete module. For one thing, it doesn't have embedded documentation, because you're reading the documentation right now.

Line 4 switches us into the IncludeHandle package, so that unqualified symbols end up in the right place. Line 5 turns on the standard compiler restrictions.

Lines 7 and 8 pull in two needed modules. First, we'll need dynamically created filehandles, so the IO::File module takes care of that for us. Further, some error messages must look as if they came from the invoker rather than one of these routines, so I'll also use the carp function for that. Both of these modules are in the standard Perl distribution, so there's no need to pull anything from the CPAN.

Lines 10 through 43 define the TIEHANDLE method. This method name is built-in to the tie interface, so you can't just make up any old name and expect it to work. The parameters are copied from the parameters to tie (skipping over the first tie parameter).

Lines 11 through 14 name those parameters. The class name (in this case, IncludeHandle) ends up in $class, while the requested filename and include specification end up in $file and $code, respectively.

Lines 16 and 17 open up the initial file. If this fails, we're not going to get very far, so a quick undef return is enough to tell the tie operator that things broke, and that's also going to return an undef to invoker of tie, triggering a die in the code presented earlier.

Lines 19 through 23 turn the include specification into a coderef if it isn't already. First, the pattern is compiled (harmless if it was already a compiled pattern) in line 20. If that fails, we'll return undef (failing the tie operation), but first setting $! to illegal seek, meaning we can't seek for include information given that bad regular expression. Cute and twisted, but I had to pick one of the existing errno codes, so my choices were limited. Obviously, the docs for a production module similar to this would describe that error and why you would get it.

Line 22 compiles an anonymous subroutine that has a closure on the $pattern lexical variable. This subroutine does a simple pattern match on $_, returning either $1 if successful, or undef. Thus, after this step, $code is always a coderef fitting the general specification described earlier.

Finally, and very essential, lines 25 to 28 return a blessed hashref, becoming the object that sits behind the tie. As further operations are performed on the tied filehandle, they'll be translated into method calls on this object. Here, I'm saving the opened filehandle, and the coderef. The opened filehandle is dropped into a single-element anonymous array, for reasons that will become apparent later. The object is returned from the tie operator, but also gets stashed away for these automatic operations. You can always get it back by calling the tied operator on the potentially tied item.

Speaking of operations on the tied filehandle, the most interesting one for our experiment is ``reading a line'', which translates into a READLINE method call on our hidden object, and this method is defined as the subroutine in lines 31 through 43. Now, recall that this operation can happen in either a scalar or list context. If it's in a scalar context, we'll fetch one line, performed in line 41 by calling ourselves as an instance method read_a_line. However, in an array context, we've got to return all the lines from all the files and their included files. The simplest way is to call the read_a_line method repeatedly until it returns undef, while gathering the results into an array. We'll then return that array. This is handled in lines 35 through 39.

So, one step further, we've got the instance method read_a_line to deal with, defined in lines 45 through 71. And here's where the include files are expanded.

First, the instance variable Handles is stuffed into the local variable $handles in line 48. Then, we'll use a naked block once again to create a nice looping control structure that doesn't involve a goto in lines 49 through 70.

Line 50 ensures that we have some handle to read from. Of course, the first time in, it'll be the handle initially created in TIEHANDLE, but this is actually a stack that can grow and shrink as include files are noted or files reach their end. And if we've gotten to the end of the last file, it's time to return undef to designate end-of-file.

Line 51 takes the most interesting filehandle (the one that we're currently reading from), and reads a line from it in line 52, using an indirect filehandle read-line operation. (I pondered a design for a few minutes that would let this be a tied filehandle, but decided that would be too mind boggling for a simple explanation for now.)

If the line cannot be read, lines 54 and 55 remove the now-useless filehandle, and restart the logic back at line 50. That's gets us back out of the nested include files, and even lets us quit at the end of the initial file.

Lines 57 to 60 determine if we're staring at a include filename. Line 58 sets up the $_ variable, and line 59 invokes the coderef stashed as the Code instance variable. Whatever this routine returns ends up in $filename.

If $filename is anything but undef, we've got a valid filename that must be included to replace this line. Line 62 attempts to open that file, and if successful, places the newly opened filehandle at the head of the queue (line 63). Otherwise (line 65), we'll squawk at the user, and go get another line. Line 67 dumps us back at the input fetching starting at line 50.

If we make it down to line 69, we've seen a good text line from somewhere, and we're ready to read it.

All files brought in with use or require must end in a true value. To be cute, I put the string 0 but true in line 73 as this particular file's true value, which is a self-documenting string.

Sure, this is mostly a toy example of a tied filehandle, but there's a lot more where that came from. And you can find out more about tied filehandles by invoking perldoc perltie. There's also examples of tied data in some of the CPAN modules, and in the Perl Cookbook (from O'Reilly and Associates). Until next time, have fun!

Listings

        =0=     LISTING ONE - ihtest
        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     
        =4=     use IncludeHandle;
        =5=     
        =6=     {
        =7=       local *FRED;
        =8=       tie *FRED, 'IncludeHandle', "localfile", q/^#include "(.*)"/
        =9=         or die "Cannot tie: $!";
        =10=      
        =11=      while (<FRED>) {
        =12=        print;
        =13=      }
        =14=    }
        =15=    
        =16=    {
        =17=      local *BARNEY;
        =18=      tie *BARNEY, 'IncludeHandle', "localfile", qr/^#include "(.*)"/
        =19=        or die "Cannot tie: $!";
        =20=      
        =21=      my @a = <BARNEY>;
        =22=      print @a;
        =23=    
        =24=    }
        =25=    
        =26=    {   
        =27=      local *DINO;
        =28=      tie *DINO, 'IncludeHandle', "localfile", sub {
        =29=        /^#include \"(.*)\"/ ? $1 : /^#include <(.*)>/ ? $1 : undef
        =30=      }
        =31=        or die "Cannot tie: $!";
        =32=            
        =33=      print <DINO>;
        =34=    }
        =0=     LISTING TWO - IncludeHandle.pm
        =1=     ## copyright (c) 1999 Randal L. Schwartz
        =2=     ## you may use this software under the same terms as Perl itself
        =3=     
        =4=     package IncludeHandle;
        =5=     use strict;
        =6=     
        =7=     use IO::File;
        =8=     use Carp qw(carp);
        =9=     
        =10=    sub TIEHANDLE {
        =11=      my $class = shift;
        =12=    
        =13=      my $file = shift;
        =14=      my $code = shift;             # might be string pattern, or qr//
        =15=    
        =16=      my $handle = IO::File->new($file, "r")
        =17=        or return undef;            # also sets $!
        =18=    
        =19=      unless ((ref $code || "") eq "CODE") {
        =20=        my $pattern = eval { qr/$code/ };
        =21=        $! = 29, return undef if $@; # bad RE
        =22=        $code = sub { $_ =~ $pattern ? $1 : undef };
        =23=      }
        =24=    
        =25=      bless {
        =26=             Handles => [$handle],
        =27=             Code => $code,
        =28=            }, $class;
        =29=    }
        =30=    
        =31=    sub READLINE {
        =32=      my $self = shift;
        =33=    
        =34=      if (wantarray) {
        =35=        my @return;
        =36=        while (defined(my $line = $self->read_a_line)) {
        =37=          push @return, $line;
        =38=        }
        =39=        @return;
        =40=      } else {
        =41=        $self->read_a_line;
        =42=      }
        =43=    }
        =44=    
        =45=    sub read_a_line {
        =46=      my $self = shift;
        =47=    
        =48=      my $handles = $self->{Handles};
        =49=      {
        =50=        return undef unless @$handles;
        =51=        my $handle = $handles->[0];
        =52=        my $result = <$handle>;
        =53=        unless (defined $result) {
        =54=          shift @$handles;
        =55=          redo;
        =56=        }
        =57=        my $filename = do {
        =58=          local $_ = $result;
        =59=          $self->{Code}->();
        =60=        };
        =61=        if (defined $filename) {    # saw an include
        =62=          if (my $include_handle = IO::File->new($filename, "r")) {
        =63=            unshift @$handles, $include_handle;
        =64=          } else {
        =65=            carp "Cannot open $filename (skipping): $!";
        =66=          }
        =67=          redo;
        =68=        }
        =69=        $result;
        =70=      }
        =71=    }
        =72=    
        =73=    "0 but true";

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.