Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 66 (Jan 2005)
[suggested title: ``Introduction to mod_perl (part 3)'']
Last month, I took a well-needed break, giving my space up for matters related to Perl6. Thank you, guest writers and thank you editors!
In the previous two columns, I introduced mod_perl
, including
fundamental concepts, basic configuration directives, and started
describing the callback API objects.
In this month, I'll finish my introduction to mod_perl
by talking about the rest of the API, showing some sample code
to use in the content and other phases, and then conclude
with some pointers to further information.
The Apache request object, often referred to as $r
, can be obtained
as the first parameter passed in to a handler subroutine, or via the
request
method of the Apache
class. Calling methods against
this object generally triggers callbacks to the published Apache API.
For example, document_root
can get (or even set) the ``document
root'' (the directory that the root URL initially is directed).
Although you might be tempted to compute a URL relative to this
directory, be aware that some directives such as Alias
will further
modify this mapping.
Another interesting request method is dir_config
, which provides
access to the PerlSetVar
and PerlAddVar
values. For example,
my @items = $r->dir_config->get('SomeKey');
sets @items
to the zero or more SomeKey
(case insensitive)
values seen in the various configuration files appropriate to the
current request. This mechanism permits a simple configuration change
to modify the behavior of code in a syntax that is config-file
compatible.
Handlers can pass notes to each other during the execution of a
given request. For example, an error handler can set a note about the
reason for a failure, then perform an internal redirect to a different
request URL to display the response. The new URL handler can include
code to fetch this note as part of the message. Or, a handler can
stuff notes that will later be picked up by a custom log description.
Perl can get and set these values with the notes
method.
Although standard notes are limited to simple text strings, arbitrary
Perl references and data structures can be set and fetched with
pnotes
. An advantage of pnotes over standard global variables is
that these notes are guaranteed to be destroyed at the end of a
request cycle, making memory management simpler.
Some of the request objects manage the handlers related to a request.
Because the list can be queried and set dynamically, a request could
have a modified execution path. The handler
method sets the handler
for the content phase, while set_handlers
, push_handlers
and
get_handlers
manage the remaining phases. One common technique is
to define a Perl trans handler that notices that a request must have
Perl treatment in one or more of the remaining phases, and then calls
set_handlers
or push_handlers
to affect those phases.
The log_error
method sends a message to the server error log. The
warn
method is similar, sending a message only if the Apache log
level is warn
or higher. Writing to STDERR
triggers the
equivalent of log_error
method calls, thanks to a tied filehandle.
Some of the methods apply to the connection object, which maps into
those parts of Apache API that relate to the current HTTP connection.
You can call the connection
method against the request object to
get the connection object, often referenced in the documentation as
$c
. In particular, calling remote_ip
against $c
gives the
current remote IP address (a quick operation). Calling remote_host
causes Apache to look up the hostname for the given IP address, using
reverse-to-forward validation for security purposes, and is thus
rather expensive and should be avoided if unnecessary.
One cool connection object method is aborted
, which returns true if
the HTTP connection has been disconnected, as reported by the
operating system. For example, a CPU-intense calculation of longer
than a dozen seconds or so can check this value periodically to see if
there'll be nobody home to hear the result.
Another part of the Apache API relates to the server itself,
referenced through the server object, created by calling server
against the request object. Through server object methods, we can
discover the server_admin
(email address), the server_hostname
(the hostname to which we are bound if multihosted), the port
number on that host, and whether or not we are a virtual server
(is_virtual
). Additionally, a space-separated list of canonical
names and aliases is returned from names
.
The Apache API also provides a set of library functions to handlers
for common operations. These functions can be imported as if they
were defined in the Apache::Util
package (not an object class).
For example, escape_html
replaces the HTML::Entities::encode
routine, operating much faster (perhaps by a factor of 100 to 1).
Similarly, escape_uri
replaces URI::Escape::uri_escape
, although
the savings aren't as significant. The inverses of unescape_html
and unescape_uri
are also provided.
Other utility routines include parsedate
(parse common forms of the
dates used in HTTP headers and logfiles, ht_time
(format a time
string similar to using strftime
), size_string
(converts a size
into a friendly value of bytes or K or M, and so on), and
validate_password
(compares an entered password against a one-way
hash such as crypt, MD5, or SHA1).
The mod_perl
API makes getting to the arguments of a request
relatively painless. To get the parameters of a GET
or HEAD
request, we look at the args
method of the request object, which
returns the parameters in key/value pairs as a list. Similarly, the
POST
parameters can be fetched with the content
method in a list
context. For example,
my %args = $r->args; my $name = $args{name}; # get param "name"
Beware, however, that ``select multiple'' parameters (which may have more than one value for a given key) are destroyed this way. To fix that, we have to work a bit harder:
my @args = ($r->args, $r->content); my %params; while (my($k, $v) = splice @args, 0, 2)) { push @{$params{$k}}, $v; }
Now @{$params{name}}
is all of the name
params.
In a scalar context, these methods return the original data (the query
string or the raw content), permitting easy regeneration of the
original request. Note that content
must be called only once per
request, because after that, the contents are gone. One quick use for
this is to simplify the code above, if you don't mind using a
CGI.pm
object:
use CGI;
my $q = CGI->new($r->args . $r->content); my @name = $q->param("name");
Other handler phases include the TransHandler
(mapping URLs to
filesystem paths), AccessHandler
(validating the client host),
AuthHandler
(validating user credentials), AuthzHandler
(verifying that the validated user can perform the requested
operation), TypeHandler
(determining the MIME type for a given
URL), and LogHandler
(creating log entries).
For example, a sample TransHandler
that ``redirects'' the request
based on the day of week might look like:
package My::TimeAdjust; use Apache::Constants qw(DECLINED); sub handler { my $r = shift; my $uri = $r->uri; my $dow = (localtime)[6]; # 0 = sunday, 6 = saturday $uri =~ s{/DOW/}{/$dow/}; $r->uri($uri); return DECLINED; }
We place this code somewhere within the @INC
path under
My/TimeAdjust.pm
, and add the handler to the configuration:
PerlTransHandler My::TimeAdjust
This single configuration directive causes all matching URLs to
trigger our TransHandler
, replacing any appearance of /DOW/
in
the URL with /3/
on Wednesday, so a fetch to /foo/bar/DOW/bletch
becomes /foo/bar/3/bletch
. Imagine trying to do that with
mod_rewrite
!
For an example AccessHandler
, let's deny access from all
odd-numbered hosts:
package My::Hosts; use Apache::Constants qw(DECLINED FORBIDDEN OK); sub handler { my $r = shift; my $host = $r->connection->remote_ip; return FORBIDDEN if $host !~ /[13579]$/; return DECLINED; # other auth may deny }
Once again, we add this to the configuration with simply:
PerlAccessHandler My::Hosts
And now our odd hosts are no longer permitted. Note that access
checks are processed in order, so we have DECLINED
if we have no
opinion, permitting other more traditional checks to apply.
As a custom LogHandler
, we might consider recording the wall clock
and CPU time used by our request. For this, we need to start and stop
a stopwatch. We'll start it with:
sub TimeIt::Init::handler { shift->pnotes('times', [time, times]); return DECLINED; }
This init handler (as early as possible in the cycle) saves the
wall-clock time and the four values from times
(user and system
CPU time, child user and child system CPU time) into a pnote for later
retrieval. At the end of the request, we get the values, compute the
delta, and log the difference:
sub TimeIt::Log::handler { my $r = shift; my @times = @{$r->pnotes('times')}; @times = map { $_ - shift @times } time, times; $r->warn($r->uri, ": real/cpu times: @times"); return DECLINED; }
We'd enable this with:
PerlInitHandler TimeIt::Init PerlLogHandler TimeIt::Log
(See also Apache::TimeIt
in the CPAN, which does this in a more
manageable fashion.)
If you're writing Apache handlers such as the ones above, you can
test them with Apache::FakeRequest
, which can fake up the various
fields of the request object:
use Apache::FakeRequest; use My::Module; my $request = Apache::FakeRequest->new('get_remote_host'=> 'foobar.com'); My::Module::handler($request);
Finally, if Apache and mod_perl
is compiled correctly, you can even
use Perl code directly in your web pages via Server-Side Includes:
<!--#include perl="perl code here" -->
I've never done this in a real website, but you might find that this helps you in a pinch.
For further information about mod_perl
, check out the installed
manpages, such as Apache
, cgi_to_mod_perl
, mod_perl
,
mod_perl_traps
and mod_perl_tuning
.
For dead-tree supplements, check out the excellent Writing Apache
Modules with Perl and C, by Doug MacEachern and Lincoln Stein, which
is the definitive guide to mod_perl
by the guys who have been
around since the beginning.
Also of noteworthy interest is: The mod_perl Developer's Cookbook,
by Geoffery Young, et al; Apache: The Definitive Guide, by Ben
Laurie and Peter Laurie; and HTTP: The Definitive Guide
, by David
Gourly, et al.
On the web, perl.apache.org
serves as the primary information on
mod_perl, including advocacy, FAQs and other links. The
www.modperlbook.com
site provides supplemental information from
Doug and Lincoln's book, and www.modperlcookbook.com
gives listings
and other information from Geoffrey's book.
Even in three columns, I've barely scratched the surface of how useful
and cool mod_perl
works for everyone. With big sites like
amazon.com
and ticketmaster.com
using the technology
continually, mod_perl
will be around for a long time to come.
Until next time, have fun, and enjoy!