Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Linux Magazine Column 03 (Aug 1999)
[suggested title: Scripting your Apache server with Perl]
According to the surveys, the open-software Apache server is the number one webserver, in terms of worldwide deployment. But how does this relate to Perl? Well, many CGI programs are written in Perl, but more importantly, we can also embed Perl directly in the Apache server.
Doug MacEachern is the architect and chief implementor of the
mod_perl
project, a Perl interpreter buried within Apache with
access to nearly the entire Apache API. This is much more than a
fancy way to invoke CGI quickly. When's the last time you were in a
CGI program wishing you could figure out what MIME type a document
was, or what filename a given URI translated to? Well, with
mod_perl
, you can call Apache's built-in routines to figure that
out with some authority!
Also, Perl code can step in at any of the operations phases: initializing a child, immediately after reading the headers, translating the URI to a filename, parsing the headers, checking host-based access, checking user credentials, verifying a user against a certain resource, determining the MIME type, fixing up the headers prior to a response, delivering the content, logging the request, cleaning up afterwards, or shutting down a child. CGI is limited to that one in the middle: ``delivering the content''.
For further information about mod_perl
, see the comprehensive web
site at perl.apache.org. Also, Doug MacEachern has released a
O'Reilly book (co-authored with CGI guru Lincoln Stein) called
Writing Apache Modules with Perl and C. For details about this
well-written book, and some sample chapters, see www.modperl.com.
One of the first problems I solved with mod_perl
was a custom
logging operation. When I had been on an ISP that had not permitted
easy access to the webserver logs, I figured out how to write a
server-side include (SSI) program to write log information to a file
of my choosing. This SSI-Logger then returned back an empty string,
causing no output to be included in the page. The information was
written in the same directory as the HTML file, in a filename that
could not be served through the web server. I merely had to include
something like this in each file to be logged:
<!--#include virtual="/cgi/ssilogger" -->
This solution worked fine, and I even continued to use it unmodified when I got my own virtual server. However, the SSI-Logger had some drawbacks. For every page that I wanted logged in this special way, I had to remember this SSI construct. Also, before the webserver could serve the page to the user, these logs all had to be updated, sometimes delaying the response. And I couldn't get logging information on anything that wasn't an HTML file.
When I upgraded to include mod_perl
in my server, I saw an
opportunity to eliminate the SSI-Logger, and replace it with a true
custom logger. I copied the code for my old SSI-Logger into the
file My/SSILog.pm
, and changed surprisingly little of the code.
Then I added the following lines to my top-level .htaccess
file:
PerlRequire /home/merlyn/lib/My/SSILog.pm PerlLogHandler My::SSILog
From then on, during the logging phase of each file served from this directory and all its subdirectories, my logging routine was invoked, as shown in [listing one, below].
Line 1 puts the data into a package of its own. Since all Perl
programs share the same global namespace in a given mod_perl
server, it's very important to have distinct package names.
Line 5 provides a version number variable, which appears to never get set anywhere else in the program. Oops.
Line 7 pulls in a set of definitions for the most common constants.
In particular, we're looking for a value for OK
, used later.
Lines 9 through 33 define the handler, which must be called handler
unless you want to go through some extra hoops. This subroutine will
be called at the end of every transaction of interest. The
Apache::Request
object is passed in as the first parameter, which I
shift off into $r
in line 10. This gives me information about what
just happened.
Line 11 quickly rejects any transaction that was an image. I didn't need any information in this specialized log about images. Note that the transfer will still be logged in the standard logs.
Line 13 changes the working directory of the webserver to the directory in which the served file is located. Since I want to put the log file in that same directory, this is a very easy maneuver.
Lines 14 and 15 create a local filehandle, and attempt to open that
handle onto a file called .ssilog.txt
in the current directory. If
this fails, we silently skip over the remaining work. Because this
open is executed as the web-server user, and not as me, I need to
ensure that any directory I want logged is either writable by the
web-server user (not a good idea) or that I've created an empty file
that can be written by that same user (what I generally do). Other
directories are merely ignored.
Lines 17 and 18 ensure that only one process at a time is writing to the logfile, and that we're at the end of the logfile on this next write.
Lines 19 through 28 construct a line for the logfile, given as the time of day, the requested URL, the remote host, any referrer if present, and any user agent if present. Each of these items is enclosed in square brackets.
Line 29 closes the file, flushing the buffers and releasing the
flock
. This is redundant, since our local filehandle is about to
go out of scope in line 31, but I wanted to make sure.
Line 32 returns from this subroutine with an OK
value. Line 35
provides a non-zero value to the implicit require
operation that
brings this module in. And that's that.
Of course, even before I had started moving production code to use
mod_perl
, I wanted to test a mod_perl
server to see if
everything would work OK. So I set up a separate web-server source
tree, and fired it up on a non-standard port (like 8080). And, it was
cute, but I didn't have any substantial content to test it with.
So I thought I'd copy my existing content over from my active site, but then noted that this would be silly, since they're both really on the same disk. So, at first, I considered just configuring the new server to read from the old content tree, but then got worried about possible corruptions. Also, this wouldn't let me try new things that overrode old things.
My next idea was to use the nifty mod_rewrite
module to allow a
shadowed environment. Each incoming URL would be tested against a
small tree associated with the new server. If that was a match, we'd
serve that as the content. Otherwise, the URL would be repointed at
the old tree, and served from there (possibly getting a 404 error
if not found). And, that wasn't terribly hard, but somewhat ugly
looking, as shown in configuration entries in [listing two, below].
For details on any of these lines, consult the mod_rewrite
documentation.
Lines 2 through 4 turn on the rewrite engine for the server, and establish a (highly verbose) log file.
Line 6 through 9 handle any local CGI programs that should override CGI programs from the live existing content. Note that I had to hardwire the test-server's document root path into the rewrite rule.
Lines 11 through 13 fall back to the live server's CGI area if there's not a local definition in the test server's area.
In the same way, lines 15 through 20 cause a local manual
or
perl
prefixed URL to remain in the local test server tree, but send
everything else over to the live server's data.
But this solution had a few drawbacks. I couldn't provide a test document that overrode the live server's documents, and I had to hardwire the names of the directories, making it hard to have two or more test servers to try.
So, I decided to use the power of a Perl handler during the
URI-to-Filename translation phase to do the lookups and adjustments.
Everything that can be done with mod_rewrite
can be done with a
proper Perl handler as well, without having to learn Yet Another
Language.
Of course, I couldn't resist adding a few features during the rewrite
of mod_rewrite
to mod_perl
, as you'll see in [listing three,
below].
Line 4 puts us into the My::Trans
package. Line 6 enables compiler
restrictions, to make sure I didn't fumble-finger any of the variable
names.
Lines 8 through 10 defined the path to the live-server's CGI and document directories. I won't need to define the test-server's paths in the same way, because I can ask the Apache API where we are.
Lines 12 through 50 define the translation handler, again named
handler
. Line 13 grabs the Apache::Request
object into $r
.
Lines 15 through 17 log this request to the server error log. We'll make this logging conditional on it being the initial request. If any handler wants to translate a name to a filename, it'll make a subrequest, and we'll get called again, but we don't want to log those. We want only the ones from the users to be logged.
Lines 19 and 20 get the document root and the requested URI.
Line 22 puts the URI into $_
for easy matching and substituting.
Lines 24 through 29 detect a CGI script in the test-server's area. If we got a match, then we'll set the filename to the local name, and that'll be the document that gets served. A log message is also generated. Returning 0 from this handler terminates the URI translation phase.
Similarly, lines 30 through 35 handle the rewrites to use any other CGI program from the live server's area.
Lines 36 through 41 similarly handle any URLs that begin with
manual
or perl
, forcing them to be interpreted in the test
server's area.
And lines 42 through 47 deal with all other URLs.
Lines 48 and 49 handle anything left. For example, a proxy URL would
not match anything with a leading slash, so we'll end up falling all
the way through to here. In this case, I'll log the confusion, and
return a -1. This -1 tells Apache that I've not handled this request,
and it should try another handler instead. (The effect is identical
to the DECLINED
response in the previous handler.)
Now I could add the following lines to my configuration files:
PerlRequire /home/merlyn/lib/My/Trans.pm PerlTransHandler My::Trans
and get a shadow area! Any files in ./htdocs
or ./cgi
would
override the existing documents and CGI programs, and I could add
Apache::Registry
programs into ./perl
, as well as serve the
provided manual information directory from ./manual
.
I'm encouraged about how easy it is to add functionality to my web
server with mod_perl
. If you give it a look, perhaps you'll draw
the same conclusion. Until next time, Enjoy!
Listings
=0= #### LISTING ONE #### =1= package My::SSILog; =2= =3= ## usage: PerlLogHandler My::SSILog =4= =5= use vars qw($VERSION); =6= =7= use Apache::Constants qw(:common); =8= =9= sub handler { =10= my $r = shift; =11= return OK if $r->content_type =~ /^image/; # don't log images =12= =13= $r->chdir_file($r->filename); =14= { =15= local *LOG; =16= if (open LOG, ">>.ssilog.txt") { =17= flock LOG, 2; =18= seek LOG, 0, 2; =19= print LOG join (" ", =20= map "[$_]", =21= scalar localtime, =22= (map { $_ || "-" } =23= $r->uri, =24= $r->get_remote_host, =25= $r->header_in("referer"), =26= $r->header_in("user-agent"), =27= ), =28= ), "\n"; =29= close LOG; =30= } =31= } =32= return OK; =33= } =34= =35= "true"; =0= #### LISTING TWO #### =1= ## turn on the engine =2= RewriteEngine on =3= RewriteLogLevel 9 =4= RewriteLog logs/rewrite_log =5= =6= # local cgi overrides other =7= RewriteCond %{REQUEST_URI} ^/cgi/ =8= RewriteCond /home/merlyn/etc/httpd/htdocs%{REQUEST_FILENAME} -f =9= RewriteRule ^ - [PT] =10= =11= # other cgi =12= RewriteRule ^/cgi/(.*)$ /WWW/stonehenge/cgi-bin/$1 [L] =13= RewriteRule ^/cgi-bin/(.*)$ /WWW/stonehenge/cgi-bin/$1 [L] =14= =15= # local htdocs overrides other =16= RewriteCond %{REQUEST_URI} ^/(manual|perl)/ =17= RewriteRule ^ - [PT] =18= =19= # other htdocs =20= RewriteRule ^/(.*)$ /WWW/stonehenge/htdocs/$1 [L] =0= #### LISTING THREE #### =1= ## install as =2= ## PerlTransHander My::Trans =3= =4= package My::Trans; =5= =6= use strict; =7= =8= my $other = "/WWW/stonehenge"; =9= my $other_cgi = "$other/cgi-bin"; =10= my $other_root = "$other/htdocs"; =11= =12= sub handler { =13= my $r = shift; =14= =15= if ($r->is_initial_req) { =16= $r->warn("request: ".$r->the_request); =17= } =18= =19= my $document_root = $r->document_root; =20= my $uri = $r->uri; =21= =22= local $_ = $uri; =23= =24= ## local /cgi/ =25= if (m{^/cgi/} and -x "$document_root$_") { =26= $r->warn("$uri => using local CGI at $document_root$_"); =27= $r->filename("$document_root$_"); =28= return 0; =29= } =30= ## old /cgi/ or /cgi-bin/ =31= if (s{^/(cgi|cgi-bin)/}{$other_cgi/}) { =32= $r->warn("$uri => using remote CGI at $_"); =33= $r->filename($_); =34= return 0; =35= } =36= ## local /manual/ or /perl/ =37= if (m{^/(manual|perl)(/|$)}) { =38= $r->warn("$uri => using local file at $document_root$_"); =39= $r->filename("$document_root$_"); =40= return 0; =41= } =42= ## any old prior =43= if (s{^/}{$other_root/}) { =44= $r->warn("$uri => using remote file at $_"); =45= $r->filename($_); =46= return 0; =47= } =48= $r->warn("$uri => huh?"); =49= return -1; =50= } =51= =52= 1;