Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 65 (Nov 2004)
[suggested title: ``Introduction to mod_perl (part 2)'']
Last month, I introduced mod_perl
, talking about fundamental
concepts and some basic configuration directives. This month, I
continue my introduction by showing some of the common modules and
cool things you can do with mod_perl
.
As mentioned last month, having persistent Perl code means that some
steps of your application can be reused rather than repeated. One
very easy optimization is keeping your database handles open between
web hits, rather than reopening them on each new hit. The
Apache::DBI
module (found in the CPAN) helps this goal by altering
the way normal DBI
connections are processed. For most users,
you simply add:
PerlModule Apache::DBI
to the configuration file, and it just magically works. The
disconnect
method of DBI
is altered so that it doesn't really
disconnect, and the connect
method attempts to reuse an already
existing handle opened with the same database parameters (including
user and password). The downside is that every mod_perl
Apache
process will eventually get one or more persistent connections to the
database server, which may affect a license count or a process limit.
Another nice module is Apache::Template
(also in the CPAN). This
module turns your page deliveries into a full embedded templating
system using Template Toolkit. The configuration is similar to
Apache::Registry
(shown last month):
PerlModule Apache::Template
<Location /tt2> SetHandler perl-script PerlHandler Apache::Template </Location>
Now, each file located within the directory mapped from the /tt2
URL will be treated as a template in the Template Toolkit language.
The template will be compiled into Perl code and executed, and the
result delivered to the web client. With additional directives, the
results of those steps can be cached, creating a very powerful website
with minimal overhead and maximum flexibility, comparable to a
PHP-based or CF-based web configuration, but with a lot more powerful
features available. Templates can also respond to CGI-style form
parameters, making any page an interactive page!
There are many other mod_perl
modules available as well. See the
current list for yourself by entering ``apache::'' at the search box of
search.cpan.org
.
It's very simple to write your own handler as well, if you can't find
something off the shelf to do what you want. Simply create a module,
like My::Module
(I tend to use My::
as a prefix for
locally-written stuff). The module should have a routine named
handler
, which will be called from the embedded Perl interpreter at
the appropriate phase. Then, install your module into mod_perl
's
@INC
path.
To trigger your module, you add the appropriately scoped Perl handler
directive to a configuration file. For example, to make your handler
the content handler for all URLs beginning with /fred
, simply add:
<Location /fred> SetHandler perl-script PerlHandler My::Module </Location>
Content handlers return content back to the web browser, and get invoked after nearly all the other phases are complete. This step is normally where things like file content delivery or CGI scripts are triggered. However, you can do anything you want. For example:
package My::Content; use Apache::Constants qw(OK); sub handler { my $r = shift; $r->send_http_header("text/plain"); $r->print("Hello, world!\n"); return OK; }
This simple content handler delivers the text/plain
-tagged content
of Hello, world!\n"
for any URL mapped to this handler. Of course,
since we're running Perl code here, we could do almost anything,
including generate dynamic content. Once loaded, a handler remains in
memory, speeding up the process signficantly. And, we can hold
database connections (see Apache::DBI
earlier), access Apache API
callbacks (discussed shortly), and get to C code for additional
libraries and optimizations.
The return value is important here. The OK
value indicates that the
handler ran successfully, and in a content handler, would also cause
an ``ok'' status to be delivered as part of the HTTP transaction.
However, we can return other values, such as NOT_FOUND
, which will
trigger the traditional ``404'' processing of the web server, including
rolling over to an ErrorDocument
if needed.
In all phases, we can also return DECLINED
, which notifies Apache
that this particular handler is not the proper handler at this phase,
and that Apache should use the next handler as if this handler weren't
here. I use DECLINED
in the content phase, for example, if my
content handler has been directed to deliver a directory instead of a
file. By declining, Apache rolls over to the next handler, which can
display a directory index if enabled, and I get my automatic directory
listing.
Declining is also useful in ``trans handlers'', which are in the early
phase where Apache decides how a URL maps to a filepath. I can
install a Perl trans handler that alters the URL or the filepath based
on various parameters or status, then returns DECLINED
, allowing
the rest of the normal translation to take place with my altered
values.
From a handler, we can access various callbacks into the Apache API, including nearly any relevant thing that can be called from any handler written in C. These APIs are presented as objects of various types, known as the request object, server object, connection object and various utility classes.
The request object is used in nearly every handler, representing
everything we need to know about what request came in, and giving us a
place to provide (or alter) the response. A handler normally
gets the request object as the first parameter, and traditionally,
this value is placed in a variable called $r
:
sub handler { my $r = shift; ... }
If you didn't save the request object, you can still get at the same API by recreating the same object, using a call to a class method:
sub handler { ... my $r = Apache->request;
This is sometimes handy if you're in a subroutine being called from another part of a handler.
The request object has many methods that be called on it. For
example, as_string
creates a human-readable dump of all the headers
and content of the request. I use this occasionally to see if I'm
getting the bits that I expect.
Some of the other methods on the request object include: method
(to
tell if a request was a GET
or POST
), header_only
(was this a
HEAD
request?), uri
(to get or set the requested URI), and
filename
(to get or set the corresponding filepath). Those last
two methods are very useful in trans handlers, allowing me to write
mod_rewrite
-like redirections in easy-to-use Perl, rather than
obtuse and limited mod_rewrite
syntax.
Other request methods less frequently used include: path_info
(to
get the part of the URL extended beyond the resource being accessed),
args
(to get at the query-string arguments) header_in
(to get at
arbitrary browser-set header items), and content
(to get at the
POST
content).
The request object also provides the response, generally in the
content phase. Calling send_http_header
triggers the beginnings of
the response, including the appropriate HTTP status code and MIME
type. The MIME type and status can be gotten and set with the
content_type
and status
methods, respectively. Arbitrary
headers can be added with header_out
. Cache-control can be altered
with the no_cache
method.
Of course the point of the content handler is delivering content.
Content is commonly sent in one of three ways. Arbitrary text can be
sent with the print
method, although the content is more commonly
delivered by printing to the STDOUT
filehandle, which is
conveniently tied to do the same thing. (STDERR
is opened on the
error log: convenient for warning messages and other information.)
For speed and convenience, an open filehandle can also be used to
deliver content, using send_fd
. Using this method invokes Apache's
internal delivery mechanism, delivering files as fast as if
mod_perl
weren't involved.
The request object also has methods related to authentication, including getting at the ``Basic Auth'' username (if any), password, and triggering a ``Basic Auth'' failure if the authentication is not provided.
A handler can also request an external or internal redirect. For example, a trans handler or a content handler can determine that the browser should be sent to a new location (external redirect):
$r->header_out("Location", "http://new.place"); return REDIRECT;
Or, a handler can select to restart processing with a new URL (internal redirect):
$r->internal_redirect("/new/url"); return OK;
When an internal redirect occurs (such as for an ErrorDocument
processing), the previous request object is accessible as well. The
main
, prev
, next
, and last
methods on the request object
allows a handler to walk up and down this chain of internal redirects.
One powerful feature of the request API is that we can ask Apache to
act ``as if'' a particular URI (lookup_uri
) or filename
(lookup_file
) was provided, then look at the status of the
subrequest (like 404 or 200) to see if that subrequest would have been
successful. The subrequest is processed up to (but not including) the
content phase, so aliases, redirects, and authorization phases are all
processed. Taking it a step further, we can also run
the
subrequest, delivering the content as part of our request. This is
similar to a server-side-include, but from within Perl. (This is
actually the API that the SSI mechanism uses.) Unfortunately, there's
no way to capture the output: it's being sent down the pipe to the web
client.
The request object also contains methods related to the web client.
We can ask for bytes_sent
to find out the size of the current
response (typically used in a log handler), and determine with
proxyreq
if this was a proxy request. We can also get hostname
,
request_time
(wallclock time of the start of the request), and
get_remote_host
(the IP address of the web client). Again, these
are typically used in log handlers.
And, I've run out of space again, but have a lot more to say. In the next part of this article, I'll continue describing the request object API and some of the other objects as well. I also show more sample code, including some nifty little handler tricks. Until next time, enjoy!