Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 03 (Aug 1999)

[suggested title: Scripting your Apache server with Perl]

According to the surveys, the open-software Apache server is the number one webserver, in terms of worldwide deployment. But how does this relate to Perl? Well, many CGI programs are written in Perl, but more importantly, we can also embed Perl directly in the Apache server.

Doug MacEachern is the architect and chief implementor of the mod_perl project, a Perl interpreter buried within Apache with access to nearly the entire Apache API. This is much more than a fancy way to invoke CGI quickly. When's the last time you were in a CGI program wishing you could figure out what MIME type a document was, or what filename a given URI translated to? Well, with mod_perl, you can call Apache's built-in routines to figure that out with some authority!

Also, Perl code can step in at any of the operations phases: initializing a child, immediately after reading the headers, translating the URI to a filename, parsing the headers, checking host-based access, checking user credentials, verifying a user against a certain resource, determining the MIME type, fixing up the headers prior to a response, delivering the content, logging the request, cleaning up afterwards, or shutting down a child. CGI is limited to that one in the middle: ``delivering the content''.

For further information about mod_perl, see the comprehensive web site at perl.apache.org. Also, Doug MacEachern has released a O'Reilly book (co-authored with CGI guru Lincoln Stein) called Writing Apache Modules with Perl and C. For details about this well-written book, and some sample chapters, see www.modperl.com.

One of the first problems I solved with mod_perl was a custom logging operation. When I had been on an ISP that had not permitted easy access to the webserver logs, I figured out how to write a server-side include (SSI) program to write log information to a file of my choosing. This SSI-Logger then returned back an empty string, causing no output to be included in the page. The information was written in the same directory as the HTML file, in a filename that could not be served through the web server. I merely had to include something like this in each file to be logged:

        <!--#include virtual="/cgi/ssilogger" -->

This solution worked fine, and I even continued to use it unmodified when I got my own virtual server. However, the SSI-Logger had some drawbacks. For every page that I wanted logged in this special way, I had to remember this SSI construct. Also, before the webserver could serve the page to the user, these logs all had to be updated, sometimes delaying the response. And I couldn't get logging information on anything that wasn't an HTML file.

When I upgraded to include mod_perl in my server, I saw an opportunity to eliminate the SSI-Logger, and replace it with a true custom logger. I copied the code for my old SSI-Logger into the file My/SSILog.pm, and changed surprisingly little of the code. Then I added the following lines to my top-level .htaccess file:

        PerlRequire /home/merlyn/lib/My/SSILog.pm
        PerlLogHandler My::SSILog

From then on, during the logging phase of each file served from this directory and all its subdirectories, my logging routine was invoked, as shown in [listing one, below].

Line 1 puts the data into a package of its own. Since all Perl programs share the same global namespace in a given mod_perl server, it's very important to have distinct package names.

Line 5 provides a version number variable, which appears to never get set anywhere else in the program. Oops.

Line 7 pulls in a set of definitions for the most common constants. In particular, we're looking for a value for OK, used later.

Lines 9 through 33 define the handler, which must be called handler unless you want to go through some extra hoops. This subroutine will be called at the end of every transaction of interest. The Apache::Request object is passed in as the first parameter, which I shift off into $r in line 10. This gives me information about what just happened.

Line 11 quickly rejects any transaction that was an image. I didn't need any information in this specialized log about images. Note that the transfer will still be logged in the standard logs.

Line 13 changes the working directory of the webserver to the directory in which the served file is located. Since I want to put the log file in that same directory, this is a very easy maneuver.

Lines 14 and 15 create a local filehandle, and attempt to open that handle onto a file called .ssilog.txt in the current directory. If this fails, we silently skip over the remaining work. Because this open is executed as the web-server user, and not as me, I need to ensure that any directory I want logged is either writable by the web-server user (not a good idea) or that I've created an empty file that can be written by that same user (what I generally do). Other directories are merely ignored.

Lines 17 and 18 ensure that only one process at a time is writing to the logfile, and that we're at the end of the logfile on this next write.

Lines 19 through 28 construct a line for the logfile, given as the time of day, the requested URL, the remote host, any referrer if present, and any user agent if present. Each of these items is enclosed in square brackets.

Line 29 closes the file, flushing the buffers and releasing the flock. This is redundant, since our local filehandle is about to go out of scope in line 31, but I wanted to make sure.

Line 32 returns from this subroutine with an OK value. Line 35 provides a non-zero value to the implicit require operation that brings this module in. And that's that.

Of course, even before I had started moving production code to use mod_perl, I wanted to test a mod_perl server to see if everything would work OK. So I set up a separate web-server source tree, and fired it up on a non-standard port (like 8080). And, it was cute, but I didn't have any substantial content to test it with.

So I thought I'd copy my existing content over from my active site, but then noted that this would be silly, since they're both really on the same disk. So, at first, I considered just configuring the new server to read from the old content tree, but then got worried about possible corruptions. Also, this wouldn't let me try new things that overrode old things.

My next idea was to use the nifty mod_rewrite module to allow a shadowed environment. Each incoming URL would be tested against a small tree associated with the new server. If that was a match, we'd serve that as the content. Otherwise, the URL would be repointed at the old tree, and served from there (possibly getting a 404 error if not found). And, that wasn't terribly hard, but somewhat ugly looking, as shown in configuration entries in [listing two, below]. For details on any of these lines, consult the mod_rewrite documentation.

Lines 2 through 4 turn on the rewrite engine for the server, and establish a (highly verbose) log file.

Line 6 through 9 handle any local CGI programs that should override CGI programs from the live existing content. Note that I had to hardwire the test-server's document root path into the rewrite rule.

Lines 11 through 13 fall back to the live server's CGI area if there's not a local definition in the test server's area.

In the same way, lines 15 through 20 cause a local manual or perl prefixed URL to remain in the local test server tree, but send everything else over to the live server's data.

But this solution had a few drawbacks. I couldn't provide a test document that overrode the live server's documents, and I had to hardwire the names of the directories, making it hard to have two or more test servers to try.

So, I decided to use the power of a Perl handler during the URI-to-Filename translation phase to do the lookups and adjustments. Everything that can be done with mod_rewrite can be done with a proper Perl handler as well, without having to learn Yet Another Language.

Of course, I couldn't resist adding a few features during the rewrite of mod_rewrite to mod_perl, as you'll see in [listing three, below].

Line 4 puts us into the My::Trans package. Line 6 enables compiler restrictions, to make sure I didn't fumble-finger any of the variable names.

Lines 8 through 10 defined the path to the live-server's CGI and document directories. I won't need to define the test-server's paths in the same way, because I can ask the Apache API where we are.

Lines 12 through 50 define the translation handler, again named handler. Line 13 grabs the Apache::Request object into $r.

Lines 15 through 17 log this request to the server error log. We'll make this logging conditional on it being the initial request. If any handler wants to translate a name to a filename, it'll make a subrequest, and we'll get called again, but we don't want to log those. We want only the ones from the users to be logged.

Lines 19 and 20 get the document root and the requested URI.

Line 22 puts the URI into $_ for easy matching and substituting.

Lines 24 through 29 detect a CGI script in the test-server's area. If we got a match, then we'll set the filename to the local name, and that'll be the document that gets served. A log message is also generated. Returning 0 from this handler terminates the URI translation phase.

Similarly, lines 30 through 35 handle the rewrites to use any other CGI program from the live server's area.

Lines 36 through 41 similarly handle any URLs that begin with manual or perl, forcing them to be interpreted in the test server's area.

And lines 42 through 47 deal with all other URLs.

Lines 48 and 49 handle anything left. For example, a proxy URL would not match anything with a leading slash, so we'll end up falling all the way through to here. In this case, I'll log the confusion, and return a -1. This -1 tells Apache that I've not handled this request, and it should try another handler instead. (The effect is identical to the DECLINED response in the previous handler.)

Now I could add the following lines to my configuration files:

        PerlRequire /home/merlyn/lib/My/Trans.pm
        PerlTransHandler My::Trans

and get a shadow area! Any files in ./htdocs or ./cgi would override the existing documents and CGI programs, and I could add Apache::Registry programs into ./perl, as well as serve the provided manual information directory from ./manual.

I'm encouraged about how easy it is to add functionality to my web server with mod_perl. If you give it a look, perhaps you'll draw the same conclusion. Until next time, Enjoy!

Listings

        =0=     #### LISTING ONE ####
        =1=     package My::SSILog;
        =2=     
        =3=     ## usage: PerlLogHandler My::SSILog
        =4=     
        =5=     use vars qw($VERSION);
        =6=     
        =7=     use Apache::Constants qw(:common);
        =8=     
        =9=     sub handler {
        =10=      my $r = shift;
        =11=      return OK if $r->content_type =~ /^image/; # don't log images
        =12=    
        =13=      $r->chdir_file($r->filename);
        =14=      {
        =15=        local *LOG;
        =16=        if (open LOG, ">>.ssilog.txt") {
        =17=          flock LOG, 2;
        =18=          seek LOG, 0, 2;
        =19=          print LOG join (" ",
        =20=                          map "[$_]",
        =21=                          scalar localtime,
        =22=                          (map { $_ || "-" }
        =23=                           $r->uri,
        =24=                           $r->get_remote_host,
        =25=                           $r->header_in("referer"),
        =26=                           $r->header_in("user-agent"),
        =27=                          ),
        =28=                         ), "\n";
        =29=          close LOG;
        =30=        }
        =31=      }
        =32=      return OK;
        =33=    }
        =34=    
        =35=    "true";
        =0=     #### LISTING TWO ####
        =1=     ## turn on the engine
        =2=     RewriteEngine on
        =3=     RewriteLogLevel 9
        =4=     RewriteLog logs/rewrite_log
        =5=     
        =6=     # local cgi overrides other
        =7=     RewriteCond %{REQUEST_URI} ^/cgi/
        =8=     RewriteCond /home/merlyn/etc/httpd/htdocs%{REQUEST_FILENAME} -f
        =9=     RewriteRule ^ - [PT]
        =10=    
        =11=    # other cgi
        =12=    RewriteRule ^/cgi/(.*)$ /WWW/stonehenge/cgi-bin/$1 [L]
        =13=    RewriteRule ^/cgi-bin/(.*)$ /WWW/stonehenge/cgi-bin/$1 [L]
        =14=    
        =15=    # local htdocs overrides other
        =16=    RewriteCond %{REQUEST_URI} ^/(manual|perl)/
        =17=    RewriteRule ^ - [PT]
        =18=    
        =19=    # other htdocs
        =20=    RewriteRule ^/(.*)$ /WWW/stonehenge/htdocs/$1 [L]
        =0=     #### LISTING THREE ####
        =1=     ## install as
        =2=     ## PerlTransHander My::Trans
        =3=     
        =4=     package My::Trans;
        =5=     
        =6=     use strict;
        =7=     
        =8=     my $other = "/WWW/stonehenge";
        =9=     my $other_cgi = "$other/cgi-bin";
        =10=    my $other_root = "$other/htdocs";
        =11=    
        =12=    sub handler {
        =13=      my $r = shift;
        =14=    
        =15=      if ($r->is_initial_req) {
        =16=        $r->warn("request: ".$r->the_request);
        =17=      }
        =18=    
        =19=      my $document_root = $r->document_root;
        =20=      my $uri = $r->uri;
        =21=    
        =22=      local $_ = $uri;
        =23=    
        =24=      ## local /cgi/
        =25=      if (m{^/cgi/} and -x "$document_root$_") {
        =26=        $r->warn("$uri => using local CGI at $document_root$_");
        =27=        $r->filename("$document_root$_");
        =28=        return 0;
        =29=      }
        =30=      ## old /cgi/ or /cgi-bin/
        =31=      if (s{^/(cgi|cgi-bin)/}{$other_cgi/}) {
        =32=        $r->warn("$uri => using remote CGI at $_");
        =33=        $r->filename($_);
        =34=        return 0;
        =35=      }
        =36=      ## local /manual/ or /perl/
        =37=      if (m{^/(manual|perl)(/|$)}) {
        =38=        $r->warn("$uri => using local file at $document_root$_");
        =39=        $r->filename("$document_root$_");
        =40=        return 0;
        =41=      }
        =42=      ## any old prior
        =43=      if (s{^/}{$other_root/}) {
        =44=        $r->warn("$uri => using remote file at $_");
        =45=        $r->filename($_);
        =46=        return 0;
        =47=      }
        =48=      $r->warn("$uri => huh?");
        =49=      return -1;
        =50=    }
        =51=    
        =52=    1;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.