Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Perl Journal magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Perl Journal Column 05 (Oct 2003)

[Suggested title: 'Safe CGI Death']

Recently, a discussion occurred on the Perl CGI beginners mailing list about using CGI::Carp, presumably in the fatalsToBrowser configuration. This core module is a very nice tool for developing Perl CGI applications, as it can be used to redirect errors, or at least make them safe to display in a web browser. For example, to make all of your fatal errors turn in to a message to your browser instead, you simply add:

        use CGI::Carp qw(fatalsToBrowser);

to the beginning of your program. Even syntax errors later in the same file will show up in the browser, which is especially nice if you're using a shared hosting environment and have difficult or no access to the web server's error logs.

But as great as such as setup is for development, it is exactly the wrong thing to do for a web site that will be accessed by the general public. An error message displayed to the browser of a random visitor is generally frowned upon (no matter how infrequent it might appear).

Worse yet, a visitor is unlikely to report the problem, or if they do, their report will likely be paraphrased or incomplete, making diagnosis difficult. (Can you tell that I've had experience working in help desks?)

But the biggest problem is the potential leak of security information. I've encountered many ill-designed websites that (upon the failure of some necessary component) showed me in intricate detail the pathnames being used, operating system details, values of internal program variables, and even the SQL and database table names for a query! This is better than dumpster-diving as a means of finding an intrusion, because the potential intruder can immediately vary their query and see how it affects the error messages.

But what else should we do? If we don't have any error trapping, we give the user the unpleasant ``500 error'' experience. If we trap the error, where do we write it?

One solution that has occurred to me is to log the error (even to the standard webserver error log), but also generate a virtual ``trouble ticket'' that will be shown to the user, with a message that is essentially ``something went wrong, we know about it, and here's our trouble ticket number if you want to reference this problem''. A simple $SIG{__DIE__} or eval handler will take care of catching the error, and then a bit of massaging of the error text will ensure a clean splatting into the error log.

But we can even take this one step further. Instead of simply telling the user that an error has occurred and been logged, let's give the user a text form to describe what they were trying to do. Using a hidden field, we can include the incident number into the form submission, to correlate their description of the problem with our known error messages, which we log into a separate file. Later, we can create trouble tickets by correlating the log files using the ticket numbers. We can also give the user the option of including a contact email address or phone number for further information, and log that as well.

The CGI script in [listing one, below] has been modified to trap its errors in such a fashion. Although the code is included explicitly in this script in the first BEGIN block, generally you'd probably want to factor that out into a site-wide module for consistency and ease of upgrading, starting every local CGI script with something like:

        use lib "/my/site/lib";
        use MySite::ErrorWrapper;

But for simplicity, let's look at the code in place.

Lines 1 through 3 start nearly every CGI program I write, turning on the normal compiler restrictions, taint checking, warnings, and unbuffering STDOUT.

Lines 5 through 44 define the BEGIN block that will set up our handler. Because it's in a BEGIN block, even syntax errors later in the file are handled properly.

Lines 6 through 43 define a die handler for all untrapped fatal errors. The text of the error message will be passed to this subroutine as its first parameter.

Line 7 creates an incident number. Here, I'm taking the current epoch time (integer seconds since 1-Jan-1970 midnight UTC) and the process ID, and concatenating them into a 12-character hex string. This is reasonably unique per host (unless you can cycle 30,000 processes in one second), but might collide if several web servers are working in a load-balancing situation. You might consider the File::CounterFile module from the CPAN for generating monotonically increasing identifiers, or perhaps a real database connection using DBI to get the next incident report number.

Line 9 takes the incoming error message text from the argument list; Line 12 logs this message to the web server error log, ensuring that every line of the error message is prefixed by the incident number in a relatively distinct pattern for later extraction and processing. For example, if die "one\ntwo\nthree\n" is triggered, the resulting error messages might look like:

  crash 3f71df085c4f: one
  crash 3f71df085c4f: two
  crash 3f71df085c4f: three

with the hex string naturally containing the timestamp and process ID.

Lines 15 to 41 present the user with a simple form, consisting of a text area for the description of the problem, a text field for the email address, and a submit button. In addition, a hidden text field includes the incident number, so that we may later correlate this submission with the trigger.

Line 42 causes the CGI program to successfully exit after having delivered the error form. This keeps the script from generating a ``500 error''.

The URL for the form has to point to a second script to capture the submitted information. A sample implementation for this script is given in [listing two, below]. But we'll get to that in a minute.

Following the BEGIN block, we put the rest of our program as we normally would. As a sample program for testing, I included some text that had a syntax error (line 49): the beginning of a regular expression without the matching end.

The error-reporting program in [listing two] should be as simple as possible. In this case, I'm pulling in CGI.pm to decode the parameters and generate the response HTML. Again, lines 1 through 3 are like the previous program, and line 5 pulls in CGI.pm.

Lines 7 to 14 use the CGI.pm shortcut methods to generate a simple response, using small soothing words.

Line 18 opens up an error log for appending. I put the error log here in /tmp for simplicity during my coding, but I'd probably move this to a more secure location for production, typically alongside the web error logs in the same directory.

Again, in the ``keep it simple'' motif, I wanted to store all the environment variables in the response, in case the user agent or source IP address or referer page might be useful. I'm doing this by using the CGI.pm's param function to pretend that additional parameters were passed as the form values. Each param is named like the Perl-style access to that environment variable. For example, a param name of $ENV{HOME} stores the value for the HOME variable.

Finally, in line 20, the contents of the form fields are dumped in CGI.pm's dump format into the logfile. By using this dump format, we'll be able to restore the data very cleanly, regardless of how many odd characters were included in the form submission or environment.

Note that it might be a good idea to flock this filehandle before writing to it, but as long as the writes are under 8K-bytes or so, we probably won't have any problems.

Once we've set up all of our CGI scripts with the BEGIN block from [listing one], and the common CGI crash-reporting script in [listing two], we're ready to watch for errors. As the errors occur, they'll be logged into the web server error log, and the user will be shown the simple form. If the user fills out and submits the form, the resulting form values (including our added %ENV values) are placed into the separate crash reporting log.

All we need to do now is extract the values from the crash reporting log, and correlate those with the web server's error log. Luckily, CGI.pm can do most of the hard work for us again. In [listing three, below], I have a short snippet of a program to extract the data.

Please note that this is not a CGI script, and will not be placed into the CGI directories of my web server, even though it contains use CGI in the listing in line 3. I'm pulling in the CGI module only to parse the CGI dump-format file. Line 5 defines the location of that file, which is opened in line 9.

Line 7 defines a top-level data structure that will hold all of the entries in the file, so that I can dump them in lines 20 and 21.

Lines 10 through 18 process through the file by having the CGI module construct new CGI objects from the file until the file is at end-of-file (tested in line 10).

Line 11 assigns the newly-constructed CGI object (pulled from the file in the CGI dump format) into the special $CGI::Q variable. This variable is the default CGI object for object-less CGI calls, so that I can just say param() later instead of $CGI::Q->param().

Line 12 creates a hash that will hold all of the params of this particular form submission.

Lines 13 to 16 go through each param, pulling out its values into @p in line 14. If there's fewer than two values (like 1 or 0), then $p[0] is used as the final value.

Line 17 pushes a reference to this param hash onto the @all data structure.

Lines 20 and 21 use Data::Dumper to show us what the various error reports would have looked like after parsing. Had this been an actual code, the incident values of each report would be keyed into some database correlated with incident values from the web server error log. But that's beyond the scope of what I wanted to show in these examples.

So, there you have it. Don't show error messages to your web users. Put them somewhere that you can get to them, but let the user annotate with their own comments to help you diagnose your problem. And as always, until next time, have the appropriate amount of fun!

Listings

        =0=     #### LISTING ONE ####
        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     BEGIN {
        =6=       $SIG{__DIE__} = sub {
        =7=         my $incident = unpack "H*", pack "NS", time, $$;
        =8=     
        =9=         my $message = shift;
        =10=    
        =11=        ## record reason for death into the error log
        =12=        warn join "", map "crash $incident: $_\n", split /\n+/, $message;
        =13=    
        =14=        ## now give the user a place to record their experience
        =15=        print <<END_OF_HTML;
        =16=    Content-type: text/html
        =17=    
        =18=    <hr>
        =19=    <h1>An error has occurred</h1>
        =20=    <p>An error has occurred.  Relevant details have already been logged for us,
        =21=    but if you want to tell us what you were doing at the time, it might help
        =22=    us determine the cause more accurately to fix the problem more easily.</p>
        =23=    <form action="/cgi/crash-reporter">
        =24=    <input type="hidden" name="incident" value="$incident">
        =25=    <table>
        =26=    <tr><th>
        =27=      Email address (optional):
        =28=    </th><td>
        =29=      <input type="text" name="email" value="foo\@example.com" size="60"/>
        =30=    </td></tr>
        =31=    <tr><th>
        =32=      Comments:
        =33=    </th><td><textarea name="comments" rows="10" cols="60">
        =34=    Describe your circumstances here.  What were you trying to do?
        =35=    </textarea></td></tr></table>
        =36=    <input type="submit" />
        =37=    </form>
        =38=    <p>Or, you can contact us at 1-800-555-5555 and reference incident
        =39=    number $incident.</p>
        =40=    <hr />
        =41=    END_OF_HTML
        =42=        exit 0;
        =43=      };
        =44=    }
        =45=    
        =46=    use CGI qw(:all);
        =47=    print start_header, start_html("Buggy Program");
        =48=    
        =49=    / bad syntax
        =50=    
        =51=    print end_html;
        =0=     #### LISTING TWO ####
        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use CGI qw(:all);
        =6=     
        =7=     print
        =8=       header,
        =9=       start_html("Incident Reporter"),
        =10=      h1("Thank you!"),
        =11=      p("Thank you for your incident report.",
        =12=        "The appropriate authorities have been notified.",
        =13=        "Remain calm."),
        =14=      end_html;
        =15=    
        =16=    ## save the parameters and environment
        =17=    
        =18=    open ERRLOG, ">>/tmp/crash-reporter.log";
        =19=    param("\$ENV{$_}", $ENV{$_}) for keys %ENV;
        =20=    $CGI::Q->save(\*ERRLOG);
        =0=     #### LISTING THREE ####
        =1=     #!/usr/bin/perl -w
        =2=     
        =3=     use CGI qw(param);
        =4=     
        =5=     my $FILE = "/tmp/crash-reporter.log";
        =6=     
        =7=     my @all;
        =8=     
        =9=     open FILE, $FILE or die;
        =10=    until (eof FILE) {
        =11=      $CGI::Q = CGI->new(\*FILE);
        =12=      my %p;
        =13=      for (param) {
        =14=        my @p = param($_);
        =15=        $p{$_} = @p < 2 ? $p[0] : \@p;
        =16=      }
        =17=      push @all, \%p;
        =18=    }
        =19=    
        =20=    use Data::Dumper;
        =21=    print Dumper \@all;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.