Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 66 (Jan 2005)

[suggested title: ``Introduction to mod_perl (part 3)'']

Last month, I took a well-needed break, giving my space up for matters related to Perl6. Thank you, guest writers and thank you editors!

In the previous two columns, I introduced mod_perl, including fundamental concepts, basic configuration directives, and started describing the callback API objects.

In this month, I'll finish my introduction to mod_perl by talking about the rest of the API, showing some sample code to use in the content and other phases, and then conclude with some pointers to further information.

The Apache request object, often referred to as $r, can be obtained as the first parameter passed in to a handler subroutine, or via the request method of the Apache class. Calling methods against this object generally triggers callbacks to the published Apache API. For example, document_root can get (or even set) the ``document root'' (the directory that the root URL initially is directed). Although you might be tempted to compute a URL relative to this directory, be aware that some directives such as Alias will further modify this mapping.

Another interesting request method is dir_config, which provides access to the PerlSetVar and PerlAddVar values. For example,

  my @items = $r->dir_config->get('SomeKey');

sets @items to the zero or more SomeKey (case insensitive) values seen in the various configuration files appropriate to the current request. This mechanism permits a simple configuration change to modify the behavior of code in a syntax that is config-file compatible.

Handlers can pass notes to each other during the execution of a given request. For example, an error handler can set a note about the reason for a failure, then perform an internal redirect to a different request URL to display the response. The new URL handler can include code to fetch this note as part of the message. Or, a handler can stuff notes that will later be picked up by a custom log description. Perl can get and set these values with the notes method.

Although standard notes are limited to simple text strings, arbitrary Perl references and data structures can be set and fetched with pnotes. An advantage of pnotes over standard global variables is that these notes are guaranteed to be destroyed at the end of a request cycle, making memory management simpler.

Some of the request objects manage the handlers related to a request. Because the list can be queried and set dynamically, a request could have a modified execution path. The handler method sets the handler for the content phase, while set_handlers, push_handlers and get_handlers manage the remaining phases. One common technique is to define a Perl trans handler that notices that a request must have Perl treatment in one or more of the remaining phases, and then calls set_handlers or push_handlers to affect those phases.

The log_error method sends a message to the server error log. The warn method is similar, sending a message only if the Apache log level is warn or higher. Writing to STDERR triggers the equivalent of log_error method calls, thanks to a tied filehandle.

Some of the methods apply to the connection object, which maps into those parts of Apache API that relate to the current HTTP connection. You can call the connection method against the request object to get the connection object, often referenced in the documentation as $c. In particular, calling remote_ip against $c gives the current remote IP address (a quick operation). Calling remote_host causes Apache to look up the hostname for the given IP address, using reverse-to-forward validation for security purposes, and is thus rather expensive and should be avoided if unnecessary.

One cool connection object method is aborted, which returns true if the HTTP connection has been disconnected, as reported by the operating system. For example, a CPU-intense calculation of longer than a dozen seconds or so can check this value periodically to see if there'll be nobody home to hear the result.

Another part of the Apache API relates to the server itself, referenced through the server object, created by calling server against the request object. Through server object methods, we can discover the server_admin (email address), the server_hostname (the hostname to which we are bound if multihosted), the port number on that host, and whether or not we are a virtual server (is_virtual). Additionally, a space-separated list of canonical names and aliases is returned from names.

The Apache API also provides a set of library functions to handlers for common operations. These functions can be imported as if they were defined in the Apache::Util package (not an object class).

For example, escape_html replaces the HTML::Entities::encode routine, operating much faster (perhaps by a factor of 100 to 1). Similarly, escape_uri replaces URI::Escape::uri_escape, although the savings aren't as significant. The inverses of unescape_html and unescape_uri are also provided.

Other utility routines include parsedate (parse common forms of the dates used in HTTP headers and logfiles, ht_time (format a time string similar to using strftime), size_string (converts a size into a friendly value of bytes or K or M, and so on), and validate_password (compares an entered password against a one-way hash such as crypt, MD5, or SHA1).

The mod_perl API makes getting to the arguments of a request relatively painless. To get the parameters of a GET or HEAD request, we look at the args method of the request object, which returns the parameters in key/value pairs as a list. Similarly, the POST parameters can be fetched with the content method in a list context. For example,

  my %args = $r->args;
  my $name = $args{name}; # get param "name"

Beware, however, that ``select multiple'' parameters (which may have more than one value for a given key) are destroyed this way. To fix that, we have to work a bit harder:

  my @args = ($r->args, $r->content);
  my %params;
  while (my($k, $v) = splice @args, 0, 2)) {
    push @{$params{$k}}, $v;
  }

Now @{$params{name}} is all of the name params.

In a scalar context, these methods return the original data (the query string or the raw content), permitting easy regeneration of the original request. Note that content must be called only once per request, because after that, the contents are gone. One quick use for this is to simplify the code above, if you don't mind using a CGI.pm object:

  use CGI;
  my $q = CGI->new($r->args . $r->content);
  my @name = $q->param("name");

Other handler phases include the TransHandler (mapping URLs to filesystem paths), AccessHandler (validating the client host), AuthHandler (validating user credentials), AuthzHandler (verifying that the validated user can perform the requested operation), TypeHandler (determining the MIME type for a given URL), and LogHandler (creating log entries).

For example, a sample TransHandler that ``redirects'' the request based on the day of week might look like:

  package My::TimeAdjust;
  use Apache::Constants qw(DECLINED);
  sub handler {
    my $r = shift;
    my $uri = $r->uri;
    my $dow = (localtime)[6]; # 0 = sunday, 6 = saturday
    $uri =~ s{/DOW/}{/$dow/};
    $r->uri($uri);
    return DECLINED;
  }

We place this code somewhere within the @INC path under My/TimeAdjust.pm, and add the handler to the configuration:

  PerlTransHandler My::TimeAdjust

This single configuration directive causes all matching URLs to trigger our TransHandler, replacing any appearance of /DOW/ in the URL with /3/ on Wednesday, so a fetch to /foo/bar/DOW/bletch becomes /foo/bar/3/bletch. Imagine trying to do that with mod_rewrite!

For an example AccessHandler, let's deny access from all odd-numbered hosts:

  package My::Hosts;
  use Apache::Constants qw(DECLINED FORBIDDEN OK);
  sub handler {
    my $r = shift;
    my $host = $r->connection->remote_ip;
    return FORBIDDEN if $host !~ /[13579]$/;
    return DECLINED; # other auth may deny
  }

Once again, we add this to the configuration with simply:

  PerlAccessHandler My::Hosts

And now our odd hosts are no longer permitted. Note that access checks are processed in order, so we have DECLINED if we have no opinion, permitting other more traditional checks to apply.

As a custom LogHandler, we might consider recording the wall clock and CPU time used by our request. For this, we need to start and stop a stopwatch. We'll start it with:

  sub TimeIt::Init::handler {
    shift->pnotes('times', [time, times]);
    return DECLINED;
  }

This init handler (as early as possible in the cycle) saves the wall-clock time and the four values from times (user and system CPU time, child user and child system CPU time) into a pnote for later retrieval. At the end of the request, we get the values, compute the delta, and log the difference:

  sub TimeIt::Log::handler {
    my $r = shift;
    my @times = @{$r->pnotes('times')};
    @times = map { $_ - shift @times } time, times;
    $r->warn($r->uri, ": real/cpu times: @times");
    return DECLINED;
  }

We'd enable this with:

  PerlInitHandler TimeIt::Init
  PerlLogHandler TimeIt::Log

(See also Apache::TimeIt in the CPAN, which does this in a more manageable fashion.)

If you're writing Apache handlers such as the ones above, you can test them with Apache::FakeRequest, which can fake up the various fields of the request object:

  use Apache::FakeRequest;
  use My::Module;
  my $request = Apache::FakeRequest->new('get_remote_host'=> 'foobar.com');
  My::Module::handler($request);

Finally, if Apache and mod_perl is compiled correctly, you can even use Perl code directly in your web pages via Server-Side Includes:

  <!--#include perl="perl code here" -->

I've never done this in a real website, but you might find that this helps you in a pinch.

For further information about mod_perl, check out the installed manpages, such as Apache, cgi_to_mod_perl, mod_perl, mod_perl_traps and mod_perl_tuning.

For dead-tree supplements, check out the excellent Writing Apache Modules with Perl and C, by Doug MacEachern and Lincoln Stein, which is the definitive guide to mod_perl by the guys who have been around since the beginning.

Also of noteworthy interest is: The mod_perl Developer's Cookbook, by Geoffery Young, et al; Apache: The Definitive Guide, by Ben Laurie and Peter Laurie; and HTTP: The Definitive Guide, by David Gourly, et al.

On the web, perl.apache.org serves as the primary information on mod_perl, including advocacy, FAQs and other links. The www.modperlbook.com site provides supplemental information from Doug and Lincoln's book, and www.modperlcookbook.com gives listings and other information from Geoffrey's book.

Even in three columns, I've barely scratched the surface of how useful and cool mod_perl works for everyone. With big sites like amazon.com and ticketmaster.com using the technology continually, mod_perl will be around for a long time to come. Until next time, have fun, and enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.