Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 76 (Dec 2005)

[Suggested title: ``Debugging Web Applications'']

I've spent a fair amount of time over the past decade debugging web applications, and even longer before that debugging programs in general. I'm occasionally asked about my strategies for debugging (especially because they seem to be a bit radical at first glance), so I thought I'd take a moment to share them with you.

First, ``debugging'' is literally ``removing bugs''. But we also use the term to mean making the program run better, or faster, or on more platforms. Debugging is important because people expect programs to work, quickly enough, and do everything that they want to do, and wherever they want to run the program. People especially expect that if a program produces some answer, it's a correct answer.

I've noticed that I'm generally in a foul mood when I have to debug for an extended period of time. This leads me to the conclusion that debugging should be avoided if at all possible. But how?

The first rule of debugging is: Don't put bugs in. OK, that may sound flippant, but it's actually simpler than it sounds, at least if you adopt the right development strategies.

I've learned over the years to write in very small increments, running the code after writing a few more lines, so I'm constantly in a ``write-run-write-run'' cycle. This strategy ensures that I never out-code my ability to understand the program I've written so far. I've frequently seen beginners write 30 lines of code before their first execution, and then when it fails to produce their result, they can't tell what broke.

And that leads me to the second rule of debugging: If it worked before, but is broken now, it's likely to be the last few lines you added. If you're now facing a broken program, concentrate on those last few new lines. Talk through the code, reading it out loud (not just in your head); I'm always amazed at what I discover when I finally speak out loud. If that doesn't do it, grab a buddy and describe those code lines to them (this is why pair programming works well). If you don't have a buddy nearby, learn to use the online forums to discuss Perl code in real time. (There are lots of helpful people out there.)

A caution here though: the bug may have been in the last few lines, or it may be that those last few lines now reveal some bug earlier in the program. That's rare, but I've seen it happen, so keep that in mind.

If that still doesn't solve it, we then go to my third rule of debugging: Add ``print'' until it works. Generally, the problem is not in how people code algorithms, but in misunderstanding how the data looks. So, add enough diagnostic output just ahead of your broken code to display the data, and at each questionable step in the broken code, and you should have that ``Aha!'' moment where you know exactly how to fix it.

In Perl, simple scalars can simply be printed, but for more complex data, get familiar with Data::Dumper:

  use Data::Dumper;
  print Dumper $complex_value;

I also frequently set the Indent value to ``1'', to keep deeply-nested data from crawling off the right side of my screen:

  $Data::Dumper::Indent = 1;

If I want the program to stop at this place instead of continuing on (with likely broken data), I simply change the print to die. Also, to dump more than one value, I can create an anonymous hash to provide the labels:

  use Data::Dumper;
  $Data::Dumper::Indent = 1;
  ...
  die Dumper { this => $this, that_array => \@that };

I know there are options to feed labels into Dumper, but I find it easier to ignore the provided $VAR1 = than to remember the entire syntax for that. Call me lazy.

As a subrule of rule three, I've learned Never throw away these traces. That is, once the code is working, I comment-out the die, without deleting the line of code. Odds are, if it was broken this time, it might be broken in the future, so having the dumping code available there and already written is quite handy.

So where does the actual Perl debugger fit in with all this? It doesn't! Using these three rules, I accomplish nearly all of my debugging. I write in small increments, looking at recently added code if something breaks, and then add print until it works if necessary. And having this methodology down long before the web came along became very handy, because there's really no place to type perl -d in the web.

I do use the debugger from time to time, but not for debugging programs. I use the debugger as an interactive execution engine, to quickly test the return values from modules I've written or I'm using in my program. For example, to test My::Module (a module I've written), I'll invoke:

  perl -MMy::Module -debug

which gets me to a debugger prompt with my module loaded. That may be surprising, so let me break it down. The -M loads my module, similar to saying use My::Module. The -d invokes the debugger, but on what program? That's given by the -e flag (bundled with -d). The text following -e is the program itself. In this case, it's the word bug as a bareword! Actually, any alphanumerics following the -de work, so I often use -deal or -dead.

Once I've typed this line, I'm sitting at an interactive prompt, with my selected module already loaded. I then make extensive use of the debugger's x command to execute an arbitrary Perl expression and display its results. For example, I might type:

  x $x = My::Module->new
  x $y = $x->do_something('hello')
  x map [$_, $_->somefunc, $_->otherfunc], @$y

And I've now invoked methods in my module, as well as performed operations on the results. The x output handles complex data structures, so I often create artificial structures with map for getting detailed analysis of the things being returned. But this is the entire extent to which I use the Perl debugger. I can't even remember the commands for single-stepping or setting breakpoints. (When I demonstrate the debugger in our Stonehenge training classes, I have to use the h help command to remember them!)

So, having gotten into the ``add print until it works'' mindset for debugging, how does that help me on the web? First, I can use warn in a CGI script to trace data values, as long as I'm watching the error log with a tail -f window somewhere. But this may not be available, or it may be that the program is broken enough that I'll simply get that dreaded 500 error instead.

Frequently, the 500 error comes from the script spitting out things that don't look like header lines when the web server is expecting a header. My favorite trick to work around that is to add:

  BEGIN { print "content-type: text/plain\n\n" }

near the top of my script (generally within the first few lines). This ensures that as the script is being parsed, we'll already have generated a valid header. This won't help if the script also exits badly (like bad syntax or a die later), but I've often been amazed at how quickly this helps me see the way out of my mistake. This also ensures that the HTML is actually treated as text, so I don't have to keep saying ``view source'' to see what the problem might be -- another timesaver.

If my program is exiting badly, a nice strategy to redirect the messages away from the error log and toward the browser is to add:

  use CGI::Carp qw(fatalsToBrowser);

near the beginning. Most of the things that would have ended up in the error log are now sent to the browser: especially handy if running tail -f on the error log is not easy or possible. Syntax errors still cause a 500 error, but you can solve that by checking syntax before running the program (using perl -c, for example).

Do not leave this CGI::Carp setting in production, however, because you'll be leaking potentially sensitive information to any random person invoking your application. I've seen far too much information dumped at me at random from some very well-known sites.

The advantage of setting up ``fatalsToBrowser'' during debugging is that CGI death is now my friend, not my enemy. I can add die in any random place in the program, and when the die is executed, I get the value in the browser. I can use the inserted die operations as a diagnostic probe, moving it around and hitting ``reload'' as necessary in my browser to repost the same information.

Another nice trick for debugging is to get Carp to turn death into stack backtraces. This is helpful when I'm not quite sure where my program is dying. I simply add:

  BEGIN {
    require Carp;
    $SIG{__DIE__} = \&Carp::confess;
  }

to my program (before the main part of the code), and now my death has a track record. Unfortunately, I can't combine this trick with the fatalsToBrowser trick, because they both stomp on $SIG{__DIE__}.

Most of the web work that I do involves Template Toolkit (which I'll abbreviate to TT here because I'm lazy). TT has exception handling similar to Perl's die/eval system, and can even catch errors thrown by Perl code being executed from TT code.

Using eval around the TT engine setup, I might get something like this for my top-level invocation:

  my %ENGINE_CONFIG = ( ... );
  my %GLOBALS = ( ... );
  my $template = "top_level.tt";
  eval {
    my $t = Template->new(%ENGINE_CONFIG);
    $t->process($template, \%GLOBALS, \my $o)
      or die $t->error;
    print $o;
  };

If there's any trouble setting up the TT engine, or any uncaught exception while executing the template, a Perl exception will be thrown, and we skip the print $o step. We can then analyze the value of $@ to see what happened:

  if (my $error = $@) {

Next, we check $error to see if it's a Template::Exception object, meaning that it was thrown by the TT engine, not because of some trouble happening at the Perl level outside of TT:

    if (eval { $error->isa("Template::Exception") }) {

For these objects, the type method will determine the class of errors. A special error class called undef means that it was a Perl error, and we'll want to see the info method's return value for our real $@ item. Otherwise, we turn the exception into a string:

      if ($error->type eq "undef") {
        $error = $error->info;
      } else {
        $error = $error->as_string;
      }
    } # matches "if (eval" above

Next, we check $error and if it's a reference, we pretty-print it with Dumper automatically:

    if (ref $error) {
      require Data::Dumper;
      $error = Data::Dumper::Dumper $error;
    }

Here, we're using require to load Data::Dumper only if needed. No sense in using it until it's being used. Finally, we dump the error to the user:

    print "Content-type: text/plain\n\n$error";
  } # matches "if ($error" above

Once I have this top-level code in place, I can see the results of exceptions thrown from the top Perl level, as well as from TT code, and from Perl things called from TT code. We can even use this with the Carp::confess hack to get a look at the calling stack as well.

Using inserted die steps, I can diagnose the data at any point of my application with the result appearing in the browser. Within TT code, I can insert debugging deaths as well. For example, I can add

  [% THROW debug foo %]

to throw an error of class debug, using the value of foo as a modifier. If no higher TT code catches this, we'll end up in the top-level handler I just described. Unfortunately, this won't display structured data, but there's always a useful cheat:

  [% THROW undef { a => this b => that } %]

which pretends we hit a Perl-level error with this complex data for $@, resulting in a pretty-printed labeled arbitrary structure.

I hope my walk through my web debugging techniques have been helpful. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 76 (Dec 2005)