Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 53 (Nov 2003)
[suggested title: ``Checking your website's health, part 1'']
In last month's column, I talked about checking one aspect of a website, namely that the internal links all point to other useful pages, and that the external links are still valid.
But that's not the only thing that can go wrong with a website. If we're looking at continuous operation of an e-commerce site using the high-availability tricks described elsewhere, we also must ensure that search forms really operate, and that the pages we visit have reasonable content. This is especially true for dynamically-generated web pages, and especially those that generate a ``everything's OK'' 200 status when the content of the page contains a Java traceback from a database connection.
So, to truly have high availability, we also have to be watching the associated programs and databases, and not just that the links all go somewhere reasonable.
The trick is to run a Perl program at regular intervals that connects
to our webserver and performs requests as if it was a visitor's
browser. While we can do this fairly directly with the LWP
package
(or even low-level programming with sockets directly), I find it
timesaving to use the rapidly-evolving WWW::Mechanize
package,
found in the CPAN. This package lets me express web page fetches and
form fillout in a way that mimics my behavior with a browser, using
LWP
and HTML::Form
to handle the details.
With WWW::Mechanize
, I get a virtual user agent that steers like a
browser. The next step is to figure out if we're getting the right
responses from the web browser. For that, I prefer the Test::More
module that is installed with modern Perl versions (and can be fetched
from the CPAN for older Perl versions).
The Test::More
module includes a number of tests that ultimately
display a series of ok
and not ok
messages on STDOUT
. These
messages are normally interpreted by Test::Harness
to give an
overall thumbs-up or thumbs-down to a test (as when you are installing
a module or Perl itself), but the individual messages and program flow
are also directly useful.
For example, let's pretend we're in charge of search.cpan.org
, and
responsible for its health. What would we want tested at regular
intervals to ensure that it is running satisfactorily?
A quick first test would be to make sure that the top-level page can be
fetched. Let's do this with WWW::Mechanize
:
use WWW::Mechanize; my $a = WWW::Mechanize->new; $a->get("http://search.cpan.org/");
Our virtual browser is now ``looking at'' the top level page. But is it
really? We can check the status using some Test::More
routines:
use Test::More no_plan; ok($a->success, "fetched /");
The ok
routine evaluates the boolean returned by the success
method. If the value is true, we get the output:
ok 1 - fetched / 1..1
The first line says that the first test passed OK, including our
comment for clear identification. The second line says that our tests
were numbered from 1 to 1. While the exact format of the lines are
dictated by Test::Harness
(and described in that manpage), they're
also grokkable by humans. If the fetch had been unsuccessful, we'd
get something like:
not ok 1 - fetched / # Failed test (./healthcheck at line 6) 1..1 # Looks like you failed 1 tests of 1.
The hash-marked lines are Test::Harness
comments. Only the not
ok
and 1..1
lines are significant to the harness. But this didn't
tell us why we failed. If we want to know how the result differs, we
can use is
rather than ok
, which prints the offending value.
Since we expect the status to be 200, we can check for that directly:
is($a->status, 200, "fetched /");
Now when the page fetch fails (like perhaps a 404 error), we get a more detailed message:
not ok 1 - fetched / # Failed test (./healthcheck at line 6) # got: '404' # expected: '200' 1..1 # Looks like you failed 1 tests of 1.
Of course, a 404 error on the root page is probably a clue that nothing else is going to work either.
We should probably make sure that we ended up with a WWW::Mechanize
object on that new
call as well. That's easy with the isa_ok
routine
provided by Test::More
:
isa_ok(my $a = WWW::Mechanize->new, "WWW::Mechanize");
and now we get:
ok 1 - The object isa WWW::Mechanize ok 2 - fetched / 1..2
Note that we now have two tests, so the final display shows that our tests are numbered 1 through 2.
The default timeout for the user-agent used by LWP
is 180 seconds.
If part of being ``healthy'' is that our website responds much faster
than that, we can verify that by changing the timeout on our virtual
browser:
$a->timeout(10);
We might also set our user-agent string to something more recognizable for the access logs, or maybe to ensure that our tests aren't included in the official statistics:
$a->agent("search.cpan.org-healthcheck/0.01");
If we get a good page fetch, we probably want to make sure it has the
right content, and isn't some other error page sent with a 200 status.
A quick check might be to verify the title of the page with
Test::More
's like
routine:
like($a->title, qr/The CPAN Search Site/, "/ title is good");
The first argument is the target string. The second argument is
typically specified using a regular expression literal object,
although you can use a text string that starts and ends with a slash
as well, for compatibility with old Perls that don't have qr//
. If
the target string matches, we get our next successful test. If it
fails, both the target string and the regular expression are
displayed, along with a failure for the test.
Obviously, this test will fail if the title is changed, so if you change the website, you'll have to change the tests. If your website is managed in a change control system, you should also update, validate, and deploy this health check in the same manner as any other component of your website.
Let's see if the links on the front page are working correctly. We
can do that with follow_link
. We'll look for the link that
says FAQ
and see if it gets us to the FAQ:
ok($a->follow_link( text => 'FAQ' ), "follow FAQ link");
The follow_link
method finds a link that has FAQ
as the entire
text. We could also find a link based on the URL, or a regular
expression match of either the text or the URL. If multiple links
match a particular requirement, we can also pick links based on their
ordinal position. If the link isn't found, we get a false return,
which fails the test. But if the link is found, we still need to find
out if the page could be fetched:
is($a->status, 200, "fetched FAQ page");
And yet, this still might be a 200-status ``error'' page instead, so we
should ensure that the content is as expected. This time we'll
use like
against the page content:
like($a->content, qr/Frequently Asked Questions/, "FAQ content matches");
Once we're satisfied that the link works, we want to go back to the
beginning page for some other tests. While we could simply get
the
page again, let's just push our virtual back button:
$a->back;
So far, our output looks like:
ok 1 - The object isa WWW::Mechanize ok 2 - fetched / ok 3 - / title matches ok 4 - follow FAQ link ok 5 - fetched FAQ page ok 6 - FAQ content matches 1..6
Not bad. We know that our website is up, and that at least two pages have reasonable HTML.
What if the FAQ link can't be found? We'll end up with an erroneous error and an erroneous success:
ok 1 - The object isa WWW::Mechanize ok 2 - fetched / ok 3 - / title matches not ok 4 - follow FAQ link # Failed test (./healthcheck at line 10) ok 5 - fetched FAQ page not ok 6 - FAQ content matches # Failed test (./healthcheck at line 12) # ' # <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> ... lots of text here ... # </html> # ' # doesn't match '(?-xism:Frequently Asked Questions)' 1..6 # Looks like you failed 2 tests of 6.
Test 4 is correctly reporting that we couldn't find the FAQ link. But test 5 succeeds! The problem is that we're testing the successful fetch of the previous page, so it's a false positive. And test 6 is really irrelevant, because we're checking the home page for the FAQ content, which wouldn't make sense, and so we're getting a false negative.
What we need to do is skip tests 5 and 6 if 4 fails. And also skip
test 6 if 5 fails. We can do this with Test::More
's skip
mechanism:
SKIP: { ok($a->follow_link( text => 'FAQ' ), "follow FAQ link") or skip "missing FAQ link", 2; SKIP: { is($a->status, 200, "fetched FAQ page") or skip "bad FAQ fetch", 1; like($a->content, qr/Frequently Asked Questions/, "FAQ content matches"); $a->back; } }
The skip mechanism uses a block labeled with SKIP
to delimit the
tests to be skipped. Since the ok
function returns a boolean
success, we can note a failed test, and execute skip
to skip the
remaining tests and exit the SKIP
block. The first parameter to
skip
is the reason for skipping, while the second parameter is the
number of tests to skip. We need to ensure the accuracy of that
number because we don't want later tests to be renumbered if we skip
some of these tests.
If the FAQ link can't be found, we get output that looks like:
not ok 4 - follow FAQ link # Failed test (./healthcheck at line 10) ok 5 # skip missing FAQ link ok 6 # skip missing FAQ link
Note that the skipped tests appear to be ``ok'', although they've
been annotated with a comment. This comment is recognized by
Test::Harness
so that it can say ``2 tests skipped''.
If the inner is
fails, we will again skip, but only the one content
test instead. If we maintain this code to add more tests, we'll need
to update all of the skip numbers properly. Note that the back
button is pressed only when we've gone forward as well.
Now let's try filling out a form, by searching for a particular author. We'll start by selecting the first (and only) form on the page:
ok($a->form_number(1), "select query form");
Next, we'll look for Andy Lester's CPAN handle. (Andy is the current
maintainer of both Test::Harness
and WWW::Mechanize
.) To do this,
we need to know the form's field names, which we can get with a ``view
source'' on the web page:
$a->set_fields(query => "PETDANCE", mode => 'author');
When that's done, we can submit the form:
$a->submit;
At this point, we should be looking at Andy's detailed CPAN page. First, let's make sure it fetched OK:
is($a->status, 200, "query returned good for 'author'");
We can then see if Andy's name is mentioned somewhere on the page. This verifies that the CGI response is working, the search engine is working, and it's returning sensible data:
like($a->content, qr/Andy Lester/, "found Andy Lester");
And of course, we'll want to skip back when we're done, ready for another test from the home page:
$a->back;
But if we can't find the form, or we can't fetch the page, we're executing too many tests and too much other code again, so we'll want to wrap this stuff up inside some nested skips as well:
SKIP: { ok($a->form_number(1), "select query form") or skip "cannot select query form", 2; $a->set_fields(query => "PETDANCE", mode => 'author'); $a->submit(); SKIP: { is($a->status, 200, "query returned good for 'author'") or skip "missing author page", 1; like($a->content, qr/Andy Lester/, "found Andy Lester"); $a->back; } }
And once again, we'll be skipping over any tests that would have given us false positives or false negatives.
In under three dozen lines of code, I now know that search.cpan.org
is up and running, generating reasonable pages with links to the FAQ
and can execute searches for authors, returning reasonable data. And
while there is room for many more tests to be performed, I've run out
of room to talk about any more of it here. Next month, I'll explore
this subject further, including how to notify someone only when
something is breaking. Until then, enjoy!