Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 65 (Nov 2004)

[suggested title: ``Introduction to mod_perl (part 2)'']

Last month, I introduced mod_perl, talking about fundamental concepts and some basic configuration directives. This month, I continue my introduction by showing some of the common modules and cool things you can do with mod_perl.

As mentioned last month, having persistent Perl code means that some steps of your application can be reused rather than repeated. One very easy optimization is keeping your database handles open between web hits, rather than reopening them on each new hit. The Apache::DBI module (found in the CPAN) helps this goal by altering the way normal DBI connections are processed. For most users, you simply add:

  PerlModule Apache::DBI

to the configuration file, and it just magically works. The disconnect method of DBI is altered so that it doesn't really disconnect, and the connect method attempts to reuse an already existing handle opened with the same database parameters (including user and password). The downside is that every mod_perl Apache process will eventually get one or more persistent connections to the database server, which may affect a license count or a process limit.

Another nice module is Apache::Template (also in the CPAN). This module turns your page deliveries into a full embedded templating system using Template Toolkit. The configuration is similar to Apache::Registry (shown last month):

  PerlModule Apache::Template

  <Location /tt2>
  SetHandler perl-script
  PerlHandler Apache::Template
  </Location>

Now, each file located within the directory mapped from the /tt2 URL will be treated as a template in the Template Toolkit language. The template will be compiled into Perl code and executed, and the result delivered to the web client. With additional directives, the results of those steps can be cached, creating a very powerful website with minimal overhead and maximum flexibility, comparable to a PHP-based or CF-based web configuration, but with a lot more powerful features available. Templates can also respond to CGI-style form parameters, making any page an interactive page!

There are many other mod_perl modules available as well. See the current list for yourself by entering ``apache::'' at the search box of search.cpan.org.

It's very simple to write your own handler as well, if you can't find something off the shelf to do what you want. Simply create a module, like My::Module (I tend to use My:: as a prefix for locally-written stuff). The module should have a routine named handler, which will be called from the embedded Perl interpreter at the appropriate phase. Then, install your module into mod_perl's @INC path.

To trigger your module, you add the appropriately scoped Perl handler directive to a configuration file. For example, to make your handler the content handler for all URLs beginning with /fred, simply add:

    <Location /fred>
    SetHandler perl-script
    PerlHandler My::Module
    </Location>

Content handlers return content back to the web browser, and get invoked after nearly all the other phases are complete. This step is normally where things like file content delivery or CGI scripts are triggered. However, you can do anything you want. For example:

  package My::Content;
  use Apache::Constants qw(OK);
  sub handler {
    my $r = shift;
    $r->send_http_header("text/plain");
    $r->print("Hello, world!\n");
    return OK;
  }

This simple content handler delivers the text/plain-tagged content of Hello, world!\n" for any URL mapped to this handler. Of course, since we're running Perl code here, we could do almost anything, including generate dynamic content. Once loaded, a handler remains in memory, speeding up the process signficantly. And, we can hold database connections (see Apache::DBI earlier), access Apache API callbacks (discussed shortly), and get to C code for additional libraries and optimizations.

The return value is important here. The OK value indicates that the handler ran successfully, and in a content handler, would also cause an ``ok'' status to be delivered as part of the HTTP transaction. However, we can return other values, such as NOT_FOUND, which will trigger the traditional ``404'' processing of the web server, including rolling over to an ErrorDocument if needed.

In all phases, we can also return DECLINED, which notifies Apache that this particular handler is not the proper handler at this phase, and that Apache should use the next handler as if this handler weren't here. I use DECLINED in the content phase, for example, if my content handler has been directed to deliver a directory instead of a file. By declining, Apache rolls over to the next handler, which can display a directory index if enabled, and I get my automatic directory listing.

Declining is also useful in ``trans handlers'', which are in the early phase where Apache decides how a URL maps to a filepath. I can install a Perl trans handler that alters the URL or the filepath based on various parameters or status, then returns DECLINED, allowing the rest of the normal translation to take place with my altered values.

From a handler, we can access various callbacks into the Apache API, including nearly any relevant thing that can be called from any handler written in C. These APIs are presented as objects of various types, known as the request object, server object, connection object and various utility classes.

The request object is used in nearly every handler, representing everything we need to know about what request came in, and giving us a place to provide (or alter) the response. A handler normally gets the request object as the first parameter, and traditionally, this value is placed in a variable called $r:

  sub handler {
    my $r = shift;
    ...
  }

If you didn't save the request object, you can still get at the same API by recreating the same object, using a call to a class method:

  sub handler {
    ...
    my $r = Apache->request;

This is sometimes handy if you're in a subroutine being called from another part of a handler.

The request object has many methods that be called on it. For example, as_string creates a human-readable dump of all the headers and content of the request. I use this occasionally to see if I'm getting the bits that I expect.

Some of the other methods on the request object include: method (to tell if a request was a GET or POST), header_only (was this a HEAD request?), uri (to get or set the requested URI), and filename (to get or set the corresponding filepath). Those last two methods are very useful in trans handlers, allowing me to write mod_rewrite-like redirections in easy-to-use Perl, rather than obtuse and limited mod_rewrite syntax.

Other request methods less frequently used include: path_info (to get the part of the URL extended beyond the resource being accessed), args (to get at the query-string arguments) header_in (to get at arbitrary browser-set header items), and content (to get at the POST content).

The request object also provides the response, generally in the content phase. Calling send_http_header triggers the beginnings of the response, including the appropriate HTTP status code and MIME type. The MIME type and status can be gotten and set with the content_type and status methods, respectively. Arbitrary headers can be added with header_out. Cache-control can be altered with the no_cache method.

Of course the point of the content handler is delivering content. Content is commonly sent in one of three ways. Arbitrary text can be sent with the print method, although the content is more commonly delivered by printing to the STDOUT filehandle, which is conveniently tied to do the same thing. (STDERR is opened on the error log: convenient for warning messages and other information.)

For speed and convenience, an open filehandle can also be used to deliver content, using send_fd. Using this method invokes Apache's internal delivery mechanism, delivering files as fast as if mod_perl weren't involved.

The request object also has methods related to authentication, including getting at the ``Basic Auth'' username (if any), password, and triggering a ``Basic Auth'' failure if the authentication is not provided.

A handler can also request an external or internal redirect. For example, a trans handler or a content handler can determine that the browser should be sent to a new location (external redirect):

  $r->header_out("Location", "http://new.place";);
  return REDIRECT;

Or, a handler can select to restart processing with a new URL (internal redirect):

  $r->internal_redirect("/new/url");
  return OK;

When an internal redirect occurs (such as for an ErrorDocument processing), the previous request object is accessible as well. The main, prev, next, and last methods on the request object allows a handler to walk up and down this chain of internal redirects.

One powerful feature of the request API is that we can ask Apache to act ``as if'' a particular URI (lookup_uri) or filename (lookup_file) was provided, then look at the status of the subrequest (like 404 or 200) to see if that subrequest would have been successful. The subrequest is processed up to (but not including) the content phase, so aliases, redirects, and authorization phases are all processed. Taking it a step further, we can also run the subrequest, delivering the content as part of our request. This is similar to a server-side-include, but from within Perl. (This is actually the API that the SSI mechanism uses.) Unfortunately, there's no way to capture the output: it's being sent down the pipe to the web client.

The request object also contains methods related to the web client. We can ask for bytes_sent to find out the size of the current response (typically used in a log handler), and determine with proxyreq if this was a proxy request. We can also get hostname, request_time (wallclock time of the start of the request), and get_remote_host (the IP address of the web client). Again, these are typically used in log handlers.

And, I've run out of space again, but have a lot more to say. In the next part of this article, I'll continue describing the request object API and some of the other objects as well. I also show more sample code, including some nifty little handler tricks. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 65 (Nov 2004)