Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Download this listing!

Linux Magazine Column 14 (Jul 2000)

[Suggested title: Capturing those CGI errors as Email]

More and more web hosting services and ISPs are providing CGI space in addition to customer web pages, either as a free add-on, or an extra-cost service. And there's even a few free CGI servers out there on the net.

The problem with these services is that the (shared) web error log is often inaccessible, or at an unknown location. That's fine if your CGI program never commits an error, or if you are using the PSI::ESP module to determine the error text. But most of us will write ``blah blah or die blah'' in our CGI scripts, expecting to somehow be told what's wrong when it goes wrong.

Some have resorted to dumping the error message to the browser. In fact, during development, there's nothing wrong with adding:

  use CGI::Carp qw(fatalsToBrowser)

to your program, and doing the debugging right in your browser window. (For details, see the CGI::Carp documentation.)

But this is a huge security hole if left in production code. While surfing the world wild web, I often see error messages that reveal far too much information. I've seen program names, user IDs, languages used, pathnames to key files, and even the exact SQL query attempted dumped out in these errors. I have no right (or need) to know that, and a bad guy can use such valuable information to assist him in breaking into the system.

So, if we can't put it into the browser, and we can't get to the error log, where else can we put the errors? Why, in email of course!

All we need is a module (let's call it FatalsToEmail.pm) that we'll stick somewhere on the system (like /home/merlyn/lib), and pull in at the top of our CGI script, like so:

  use lib "/home/merlyn/lib";
  use FatalsToEmail qw(Address merlyn@stonehenge.comm);

And then when the CGI script dies, the text of the error message gets sent to me, while the user is told that ``something went wrong''. Too cool? Yup, so read on.

The module source is in [listing one, below]. Line 1 sets up the package, important because we don't want any symbols to collide with the user of this module. Line 2 enables my favorite compiler restrictions, including requiring me to declare my variables, discouraging the use of symbolic references, and preventing barewords from being treated as quoted strings.

Llines 4 through 10 set up the four configuration variables for the module. The Address provides the email address to which the messages should go, here defaulting to webmaster at the mail host. Speaking of which, Mailhost sets up the mail delivery host. This doesn't have to be the final machine on which the mail ends up, but we'll need a friendly SMTP server somewhere that can handle mail from the script. The localhost default should be fine for most machines, except those that don't run a mailserver on the webserver.

The Cache and Seconds parameter interact to limit the amount of mail delivered. The default Cache value of undef gives the script the right to deliver a single separate piece of email for each fatal error. This is great for testing or for low-volume sites. But it'd be a potential ``denial of service'' attack for high-volume sites or malicious users.

So instead, we bunch up the rapidly appearing messages into a cache, guaranteed to be sent no more often than the indicated number of seconds. To get the bunching up, Cache must be set to a filename path that is writeable by the user ID executing the CGI program. A typical value might be something like /tmp/merlyn.weberrors.cache. (The actual caching strategy is defined below.) Several programs can share the same cache: the error messages within the cache are prefixed by the filename and line number from which they sprouted.

Lines 12 though 20 handle the configuration of the module. If the use line appears like:

  use FatalsToEmail qw(
    Address merlyn@stonehenge.comm
    Cache /tmp/merlyn.weberrors.cache
  );

then we save Address and Cache to override the default values. The logic in line 15 ensures that someone can use cache or CACHE or even <cAcHe> for the identifier tag, and we'll still store it into the right hash slot.

Line 22 establishes the subroutine in this module as the die handler. From here on out, we're the ones that will get called on a fatal error.

Lines 24 through 44 define this handler. The text message for the fatal error shows up in $message in line 25. Line 26 gets the current local time of day to label the message consistently. Line 27 extracts information about the filename and line number from which the error message was triggered.

Lines 29 through 31 prefix each line in the message with a unique identifier, consisting of the file name, line number, time of day, and process ID number. This is helpful to group error messages in cache-dumping email, as well as provide the necessary locators to fix the problem.

Lines 33 to 39 dump a CGI response. Note that minimal information is provided.

If the CGI program has already sent an HTTP header, the header we print in line 34 will show up as content. There's nothing much I can do about that from this module, at least not in a CGI environment.

Then there's the cleanup. Line 41 triggers the email (or caching, if needed). And line 43 executes a die within the die handler. This step is needed so that Perl knows to finish aborting the program. The message shows up on STDERR, which will typically be the real web error log.

Lines 46 to 89 attempt to phone home with the error message. (Perhaps I should have called this subroutine ``e_t''?). Nearly everything is inside a large eval block so that any mistakes will still set up a graceful exit from this program. The message is captured and delivered in lines 86 to 88, including both the original message that wanted to be mailed, as well as the error that kept it from being mailed.

Lines 50 to 75 handle the cache, if needed, as determined by line 51. If we're caching, then line 52 attempts to open the cache file for both reading and writing. If that succeeds, it's time to operate.

Line 53 blocks the process until we're the only one using the file in this manner. We'll want to keep the time to a bare minimum from here until we close the filehandle, because we've just entered a zone where only one process at a time can be within.

Lines 55 to 62 handle the case where the cache is old enough for us to send. If the file modification time (``mtime'') is more than some number of seconds ago (determined by the Seconds configuration variable), then it's been a while since we wanted to send some email, and there may in fact be previous contents that we deferred. So lines 57 and 58 grab that. If there's more than 8K in the cache, we send only the first 8k and a warning. This keeps the mail message from becoming yet another denial of service attack: filling up our mail spool. At most, we'll get roughly 8K every 60 seconds (or whatever Seconds is configured as). The old cached messages are prepended in front of the current message in line 59.

Lines 60 and 61 remove the cached material, so that we have an empty file that has been modified just now, regardless of whether any prior content was in the file. This is important, because we want the next hit within the cache window to be deferred, repeatedly, until we have another idle period. And finally, line 62 closes the cache, also releasing the lock, since we now have the information we need.

Lines 64 to 67 handle the hits within the cache window, no more than Seconds number of seconds since the previous hit. In this case, we just seek to the end of the file, and dump the message there. We are guaranteed that the message will end in newline, but if I wasn't sure, I'd add a \n here somewhere, to ensure that each error message has lines that start with the prefix identifier computed earlier. The return in line 68 skips over all the email handling code below, since we won't be sending any mail on this trip.

Lines 71 to 73 create an initial empty cache file if it doesn't exist. We'll treat a non-existing file as if it was an empty old file, which means it still needs to have a timestamp updated to ``right now''. Note the use of the ``append'' open operation: since we don't have a lock, we may be trying to create a file when there's already someone else in the meanwhile that has created the file, gotten a lock, and started writing in the file (it could happen!), so the best we can do is ask the kernel to ``make the file if it doesn't exist, or be ready to append to it if it does''. Which works here just as we needed it.

Now for the fun part. Lines 77 to 84 send the email message. First, we try to suck in the Net::SMTP module in line 77. This may not be possible, because the CPAN module may not be installed (it's part of the libnet bundle from Graham Barr, not part of the core installation). However, the require directive might fail, so it's inside an eval. If the require succeeds, the value of 1 is returned, stopping our inner die operation, otherwise we'll abort. The die here is being caught by the outer eval. Wheeee.

Lines 78 and 79 set up a Net::SMTP connection object to the requested Mailhost. If there are any errors, I'm told the error will be in $@, not <$!>, so I include that here in the die message. Again, this die will be caught by the outer eval block.

Lines 80 and 81 tell the SMTP server what our sender name is, and what the recipient name is. The sender name will be used for error messages from the various mailers along the way, and the recipient name is the ultimate destination. Here, we're using the configured email address for both. This could get weird if the address is undeliverable: the final mail host will attempt to bounce the message back to the same address to which it is attempting delivery. Hmm. Not a good idea. But it's better than any alternative I could think of today.

Lines 82 and 83 provide a subject line and a body for the message. The subject line has nice bright shiny capital letters in it, including the name of the program that triggered the error for easy mail filtering by smart email readers. Note that most mail servers will also construct a Date and From and To header for us automatically, so I can lean on it to do the job.

And finally, line 84 tells the mail server we're done for this round, and shuts down the connection.

That's it. Put FatalsToEmail.pm some place accessible to your CGI script, add the appropriate use lib line to point at the directory, and you can start getting your errors via a timely email message instead of having to scruff through the old shared web error log. Until next time, enjoy!

Listings

        =1=     package FatalsToEmail;
        =2=     use strict;
        =3=     
        =4=     my %config =
        =5=       (
        =6=        Address => "webmaster",      # email address
        =7=        Mailhost => "localhost",     # mail server
        =8=        Cache => undef,              # undef means don't use
        =9=        Seconds => 60,
        =10=      );
        =11=    
        =12=    sub import {
        =13=      my $package = shift;
        =14=      while (@_) {
        =15=        my $key = ucfirst lc shift;
        =16=        die "missing argument to $key" unless @_;
        =17=        die "unknown argument $key" unless exists $config{$key};
        =18=        $config{$key} = shift;
        =19=      }
        =20=    }
        =21=    
        =22=    $SIG{__DIE__} = \&trapper;
        =23=    
        =24=    sub trapper {
        =25=      my $message = shift;
        =26=      my $time = localtime;
        =27=      my ($pack, $file, $line) = caller;
        =28=    
        =29=      my $prefix = localtime;
        =30=      $prefix .= ":$$:$file:$line: ";
        =31=      $message =~ s/^/$prefix/mig;
        =32=    
        =33=      print STDOUT <<END;
        =34=    Content-Type: text/html
        =35=    
        =36=    <h1>Sorry!</h1>
        =37=    <p>An error has occurred; details have been logged.
        =38=    Please try your request again later.
        =39=    END
        =40=    
        =41=      send_mail($message);
        =42=      
        =43=      die "${prefix}died - email sent to $config{Address} via $config{Mailhost}\n";
        =44=    }
        =45=    
        =46=    sub send_mail {
        =47=      my $message = shift;
        =48=    
        =49=      eval {
        =50=        ## do I need to cache this?
        =51=        if (defined (my $cache = $config{Cache})) {
        =52=          if (open CACHE, "+<$cache") {
        =53=            flock CACHE, 2;
        =54=            ## it's mine, see if it's old enough
        =55=            if (time - (stat(CACHE))[9] > $config{Seconds}) {
        =56=              ## yes, suck any content, and zero the file
        =57=              my $buf;
        =58=              $buf .= "\n...[truncated]...\n" if read(CACHE, $buf, 8192) >= 8192;
        =59=              $message = $buf . $message;
        =60=              seek CACHE, 0, 0;
        =61=              truncate CACHE, 0;
        =62=              close CACHE;
        =63=            } else {
        =64=              ## no, so just drop the stuff at the end
        =65=              seek CACHE, 0, 2;
        =66=              print CACHE $message;
        =67=              close CACHE;
        =68=              return;
        =69=            }
        =70=          } else {
        =71=            ## it doesn't exist, so create an empty file for stamping, and email
        =72=            open CACHE, ">>$cache" or die "Cannot create $cache: $!";
        =73=            close CACHE;
        =74=          }
        =75=        }
        =76=    
        =77=        eval { require Net::SMTP; 1 } or die "no Net::SMTP";
        =78=        my $mail = Net::SMTP->new($config{Mailhost})
        =79=          or die "Net::SMTP->new returned $@";
        =80=        $mail->mail($config{Address}) or die "from: $@";
        =81=        $mail->to($config{Address}) or die "to: $@";
        =82=        $mail->data("Subject: CGI FATAL ERROR in $0\n\n", $message)
        =83=          or die "data: $@";
        =84=        $mail->quit or die "quit: $@";
        =85=      };
        =86=      if ($@) {
        =87=        die "$message(send_mail saw $@)\n";
        =88=      }
        =89=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 14 (Jul 2000)

Listings