Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 53 (Nov 2003)

[suggested title: ``Checking your website's health, part 1'']

In last month's column, I talked about checking one aspect of a website, namely that the internal links all point to other useful pages, and that the external links are still valid.

But that's not the only thing that can go wrong with a website. If we're looking at continuous operation of an e-commerce site using the high-availability tricks described elsewhere, we also must ensure that search forms really operate, and that the pages we visit have reasonable content. This is especially true for dynamically-generated web pages, and especially those that generate a ``everything's OK'' 200 status when the content of the page contains a Java traceback from a database connection.

So, to truly have high availability, we also have to be watching the associated programs and databases, and not just that the links all go somewhere reasonable.

The trick is to run a Perl program at regular intervals that connects to our webserver and performs requests as if it was a visitor's browser. While we can do this fairly directly with the LWP package (or even low-level programming with sockets directly), I find it timesaving to use the rapidly-evolving WWW::Mechanize package, found in the CPAN. This package lets me express web page fetches and form fillout in a way that mimics my behavior with a browser, using LWP and HTML::Form to handle the details.

With WWW::Mechanize, I get a virtual user agent that steers like a browser. The next step is to figure out if we're getting the right responses from the web browser. For that, I prefer the Test::More module that is installed with modern Perl versions (and can be fetched from the CPAN for older Perl versions).

The Test::More module includes a number of tests that ultimately display a series of ok and not ok messages on STDOUT. These messages are normally interpreted by Test::Harness to give an overall thumbs-up or thumbs-down to a test (as when you are installing a module or Perl itself), but the individual messages and program flow are also directly useful.

For example, let's pretend we're in charge of search.cpan.org, and responsible for its health. What would we want tested at regular intervals to ensure that it is running satisfactorily?

A quick first test would be to make sure that the top-level page can be fetched. Let's do this with WWW::Mechanize:

  use WWW::Mechanize;
  my $a = WWW::Mechanize->new;
  $a->get("http://search.cpan.org/";);

Our virtual browser is now ``looking at'' the top level page. But is it really? We can check the status using some Test::More routines:

  use Test::More no_plan;
  ok($a->success, "fetched /");

The ok routine evaluates the boolean returned by the success method. If the value is true, we get the output:

  ok 1 - fetched /
  1..1

The first line says that the first test passed OK, including our comment for clear identification. The second line says that our tests were numbered from 1 to 1. While the exact format of the lines are dictated by Test::Harness (and described in that manpage), they're also grokkable by humans. If the fetch had been unsuccessful, we'd get something like:

  not ok 1 - fetched /
  #     Failed test (./healthcheck at line 6)
  1..1
  # Looks like you failed 1 tests of 1.

The hash-marked lines are Test::Harness comments. Only the not ok and 1..1 lines are significant to the harness. But this didn't tell us why we failed. If we want to know how the result differs, we can use is rather than ok, which prints the offending value. Since we expect the status to be 200, we can check for that directly:

  is($a->status, 200, "fetched /");

Now when the page fetch fails (like perhaps a 404 error), we get a more detailed message:

  not ok 1 - fetched /
  #     Failed test (./healthcheck at line 6)
  #          got: '404'
  #     expected: '200'
  1..1
  # Looks like you failed 1 tests of 1.

Of course, a 404 error on the root page is probably a clue that nothing else is going to work either.

We should probably make sure that we ended up with a WWW::Mechanize object on that new call as well. That's easy with the isa_ok routine provided by Test::More:

  isa_ok(my $a = WWW::Mechanize->new, "WWW::Mechanize");

and now we get:

  ok 1 - The object isa WWW::Mechanize
  ok 2 - fetched /
  1..2

Note that we now have two tests, so the final display shows that our tests are numbered 1 through 2.

The default timeout for the user-agent used by LWP is 180 seconds. If part of being ``healthy'' is that our website responds much faster than that, we can verify that by changing the timeout on our virtual browser:

  $a->timeout(10);

We might also set our user-agent string to something more recognizable for the access logs, or maybe to ensure that our tests aren't included in the official statistics:

  $a->agent("search.cpan.org-healthcheck/0.01");

If we get a good page fetch, we probably want to make sure it has the right content, and isn't some other error page sent with a 200 status. A quick check might be to verify the title of the page with Test::More's like routine:

  like($a->title, qr/The CPAN Search Site/, "/ title is good");

The first argument is the target string. The second argument is typically specified using a regular expression literal object, although you can use a text string that starts and ends with a slash as well, for compatibility with old Perls that don't have qr//. If the target string matches, we get our next successful test. If it fails, both the target string and the regular expression are displayed, along with a failure for the test.

Obviously, this test will fail if the title is changed, so if you change the website, you'll have to change the tests. If your website is managed in a change control system, you should also update, validate, and deploy this health check in the same manner as any other component of your website.

Let's see if the links on the front page are working correctly. We can do that with follow_link. We'll look for the link that says FAQ and see if it gets us to the FAQ:

  ok($a->follow_link( text => 'FAQ' ), "follow FAQ link");

The follow_link method finds a link that has FAQ as the entire text. We could also find a link based on the URL, or a regular expression match of either the text or the URL. If multiple links match a particular requirement, we can also pick links based on their ordinal position. If the link isn't found, we get a false return, which fails the test. But if the link is found, we still need to find out if the page could be fetched:

  is($a->status, 200, "fetched FAQ page");

And yet, this still might be a 200-status ``error'' page instead, so we should ensure that the content is as expected. This time we'll use like against the page content:

  like($a->content, qr/Frequently Asked Questions/, "FAQ content matches");

Once we're satisfied that the link works, we want to go back to the beginning page for some other tests. While we could simply get the page again, let's just push our virtual back button:

  $a->back;

So far, our output looks like:

  ok 1 - The object isa WWW::Mechanize
  ok 2 - fetched /
  ok 3 - / title matches
  ok 4 - follow FAQ link
  ok 5 - fetched FAQ page
  ok 6 - FAQ content matches
  1..6

Not bad. We know that our website is up, and that at least two pages have reasonable HTML.

What if the FAQ link can't be found? We'll end up with an erroneous error and an erroneous success:

  ok 1 - The object isa WWW::Mechanize
  ok 2 - fetched /
  ok 3 - / title matches
  not ok 4 - follow FAQ link
  #     Failed test (./healthcheck at line 10)
  ok 5 - fetched FAQ page
  not ok 6 - FAQ content matches
  #     Failed test (./healthcheck at line 12)
  #                   '
  # <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  ... lots of text here ...
  # </html>
  # '
  #     doesn't match '(?-xism:Frequently Asked Questions)'
  1..6
  # Looks like you failed 2 tests of 6.

Test 4 is correctly reporting that we couldn't find the FAQ link. But test 5 succeeds! The problem is that we're testing the successful fetch of the previous page, so it's a false positive. And test 6 is really irrelevant, because we're checking the home page for the FAQ content, which wouldn't make sense, and so we're getting a false negative.

What we need to do is skip tests 5 and 6 if 4 fails. And also skip test 6 if 5 fails. We can do this with Test::More's skip mechanism:

  SKIP: {
    ok($a->follow_link( text => 'FAQ' ), "follow FAQ link")
      or skip "missing FAQ link", 2;
   SKIP: {
      is($a->status, 200, "fetched FAQ page")
        or skip "bad FAQ fetch", 1;
      like($a->content, qr/Frequently Asked Questions/,
          "FAQ content matches");
      $a->back;
    }
  }

The skip mechanism uses a block labeled with SKIP to delimit the tests to be skipped. Since the ok function returns a boolean success, we can note a failed test, and execute skip to skip the remaining tests and exit the SKIP block. The first parameter to skip is the reason for skipping, while the second parameter is the number of tests to skip. We need to ensure the accuracy of that number because we don't want later tests to be renumbered if we skip some of these tests.

If the FAQ link can't be found, we get output that looks like:

  not ok 4 - follow FAQ link
  #     Failed test (./healthcheck at line 10)
  ok 5 # skip missing FAQ link
  ok 6 # skip missing FAQ link

Note that the skipped tests appear to be ``ok'', although they've been annotated with a comment. This comment is recognized by Test::Harness so that it can say ``2 tests skipped''.

If the inner is fails, we will again skip, but only the one content test instead. If we maintain this code to add more tests, we'll need to update all of the skip numbers properly. Note that the back button is pressed only when we've gone forward as well.

Now let's try filling out a form, by searching for a particular author. We'll start by selecting the first (and only) form on the page:

  ok($a->form_number(1), "select query form");

Next, we'll look for Andy Lester's CPAN handle. (Andy is the current maintainer of both Test::Harness and WWW::Mechanize.) To do this, we need to know the form's field names, which we can get with a ``view source'' on the web page:

  $a->set_fields(query => "PETDANCE", mode => 'author');

When that's done, we can submit the form:

  $a->submit;

At this point, we should be looking at Andy's detailed CPAN page. First, let's make sure it fetched OK:

  is($a->status, 200, "query returned good for 'author'");

We can then see if Andy's name is mentioned somewhere on the page. This verifies that the CGI response is working, the search engine is working, and it's returning sensible data:

  like($a->content, qr/Andy Lester/, "found Andy Lester");

And of course, we'll want to skip back when we're done, ready for another test from the home page:

  $a->back;

But if we can't find the form, or we can't fetch the page, we're executing too many tests and too much other code again, so we'll want to wrap this stuff up inside some nested skips as well:

  SKIP: {
    ok($a->form_number(1), "select query form")
      or skip "cannot select query form", 2;
    $a->set_fields(query => "PETDANCE", mode => 'author');
    $a->submit();
   SKIP: {
      is($a->status, 200, "query returned good for 'author'")
        or skip "missing author page", 1;
      like($a->content, qr/Andy Lester/, "found Andy Lester");
      $a->back;
    }
  }

And once again, we'll be skipping over any tests that would have given us false positives or false negatives.

In under three dozen lines of code, I now know that search.cpan.org is up and running, generating reasonable pages with links to the FAQ and can execute searches for authors, returning reasonable data. And while there is room for many more tests to be performed, I've run out of room to talk about any more of it here. Next month, I'll explore this subject further, including how to notify someone only when something is breaking. Until then, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.