Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Perl Journal magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Perl Journal Column 01 (May 2003)

[suggested title: 'Watching log files']

One of the most boring tasks I perform as a system administrator is watching a log file. It's like watching a pot of stew boil. Or ice melt.

But watching a log file is also a great way to get a quick view of the activity on the system. For example, watching an Apache access log is great to see how ``bursty'' the traffic can be. But unless you're staring at the tail -f on the log file constantly, or you can pick out the timestamps of each logged line and do the math in your head quickly, you can't really tell which lines were ``together'' in a particular burst.

So, one day recently I got the idea for a ``web tail'' program. A browser would fetch data from a custom web server, which would be watching the log file, noting the arrival time of each line. The display would show the last, say, 16 lines of log, but colorized in such a way that I could see the age of each line at a glance. That way, lines that arrived at similar times would have a similar color coding.

And then I stumbled across two pieces of the POE (Perl Object Environment) package that would make writing such a program easier. POE is a great collection of tools to manage events and ``threaded'' Perl code, good for servers, clients and event-based glue. POE is described in more detail at http://poe.perl.org.

And the result is in [listing one, below].

Line 1 is the standard hash-bang line for Perl on my system. Obviously, this will need to be changed depending on where Perl is installed.

Line 2 turns on the standard restrictions: symbolic references are disabled, implicit package variables are disabled (forcing most variables to be declared with my), and ``bare words'' are disabled. This is always a good thing for programs exceeding a dozen lines or so.

Lines 5 through 8 are the tweakable parts of this script. The $FILENAME gives the file to be followed. In this case, it's the Apache access log for my web-server front-end reverse proxy. As I was testing this script, I found that to be a nice source of bursty data.

The $TAILING variable controls how many lines of the logfile are visible. Too little, and we lose context. Too much, and we scroll too often.

The $URGENT and $OLD variables both define the number of seconds. Data that has arrived within $URGENT seconds is always pure white on the output. As it passes that threshold, it gets light green through dark green, maximizing the darkest pure green at $URGENT plus $OLD seconds. In this case, I'm letting it fade to dark green in about a minute. Again, the values will depend on the data being watched. I get about 20,000 hits a day on the web server, so these particular settings showed lots of variations between everything being all white, and all green. At a glance, I could quickly tell how busy the recent traffic was being.

Line 11 defines the rolling data buffer, holding the most recent $TAILING events. Items are pushed onto the end, and shifted off the front as needed. Each element of the array is a two-element array ref, holding the Unix epoch-based integer time value when the data was added, and the line itself (without a newline, apparently).

Line 12 pulls in the HTML-generating shortcuts from the core CGI module. Although we actually aren't in any kind of CGI mode here, I find the HTML shortcuts to be easier than typing a lot of angle brackets myself.

Line 13 brings in the POE pieces. Arguments passed to use POE act like I had said:

  use POE;
  use POE::Component::Server::HTTP;
  use POE::Wheel::FollowTail;

That's a convenient convention with POE's non-standard import list.

Lines 15 through 54 create the web-server side of the POE process, using the POE::Component::Server::HTTP module (often referenced as PoCo::HTTPD in brief). I lifted this example almost directly from the POE Cookbook, available in the Wiki at http://poe.perl.org.

Line 17 defines the port number of the web server. By default, the web server listens on all IP addresses at this port number. I picked 42042 as an easy number to remember. You'd definitely want to check to ensure that such a port was not already in use by another application.

Lines 18 to 53 define the only ContentHandler for this server. Any request to any URL below slash (therefore, any request) gets directed to this handler. The two parameters to the subroutine are the request and response objects. The handler is responsible for examining the request object, then updating the response object as needed, and then returning the appropriate HTTP status code. Because we are returning the same contents regardless of the request, no checking of the request is performed.

Line 22 sets the response to the HTTP OK value of 200. Line 23 sets the content type of the response to text/html, indicating a typical HTML response.

Lines 24 to 50 generate the content for the response, using the CGI.pm's HTML shortcuts. The join in line 25 turns the many pieces into one string. I wasn't sure if I needed to do that by looking at the manpage for HTTP::Message, but I knew it couldn't hurt.

Lines 26 to 30 generate the HTML head and title section, including the very important ``meta-refresh''. This refresh will cause most modern browsers to reload the data repeatedly every five seconds. The #bottom anchor will ensure that most browsers also scroll to the bottom anchor (defined in line 49). The background of the window is set to black, which seemed to work out best during testing.

Lines 31 to 48 generate the bulk of the output: a single table. Each row of the table is one of the $TAILING lines, and consists of two table cells. The left cell is the timestamp (in the localtime of the web server) in a nice cyan lettering, while the right cell is the log file line.

The log line is in Courier font, selected explicitly. During testing, I tried putting it inside a pre element, but that kept the line from wrapping, making it difficult to read very wide log lines.

The lines of the table come from a map, which processes the elements of @data (line 48), invoking the block defined in lines 32 to 47 for each element. Line 32 extracts the timestamp and the log line itself, by dereferencing the $_ (the current element of @data) as an arrayref.

Lines 33 to 42 figure out what color a line that was seen at $stamp time should be. First, the age of the line is computed in line 33. If the age is below $URGENT seconds (line 35), then the color is forced to white. Otherwise, the age is scaled by subtracting $URGENT (line 38) and maximized to be no more than $OLD seconds (line 39).

Line 40 computes a $c value that will be 0 for the oldest lines and 255 for the very newest. Line 41 uses this value twice to create a color that scales from #ffffff (white) down to #00ff00 (pure green) linearly. Thus, lines that arrived at identical times have identical values, and lines that arrived at differing times generally can be rapidly visually distinguished.

Once we have a color, we can construct the table row, starting in line 43. Lines 43 through 45 create the timestamp. The valign parameter is set to top so that the timestamp always appears at the start of the log line, even if the log line wraps. The font element is used to assign the color cyan to the timestamp. Line 45 uses a sprintf to create a zero-padded time value. The localtime operator is called in a list context for a list slice, and the hours, minutes, and seconds values are selected in the proper order for the sprintf. Nice trick.

Lines 46 and 47 create the table cell for the log line. The line is HTML-entitized using escapeHTML, ensuring that less-thans don't ruin my day. I can just imagine someone deliberately visiting a URL of <BLINK> just to mess up the rest of my display. But on a more serious note, we don't want to open ourselves up to a cross-site-scripting attack either. Line 46 uses another font element to define the proper color and font face for the text.

Line 49 adds an anchor to the bottom of the display, which will be made visible by most browsers because of the address of the meta-refresh. This ensures that we are looking at the most recent data (at the bottom), even if the data exceeds the window size.

Finally, the closing body and html tags are created with line 50.

Once the content of the response has been established, we return from the subroutine with an OK code again, thus passing back the proper HTTP status to the browser.

That wraps up the web-server side of the process. As a browser connects to port 42042, this code gets activated, the current @data gets filtered and formatted, and the response is returned to the browser. The meta-refresh causes the browser to refetch the same URL five seconds later, with the updated response sent to the browser again.

But how does @data change? Well, within the same process, we also have a POE::Session object that is running a POE wheel. This session is created in lines 57 to 75. A POE wheel defines a correlated set of states and events for a higher-level function. This particular wheel knows how to ``tail'' a file, generating events when new data is available.

Line 59 declares that our session will have ``inline states'', one of the most common ways to create a session. We could also have ``object'' states or ``package'' states, but this seems to work the easiest for this example.

When the session is first started, the _start state is called. We'll define this start handler as an inline coderef starting in line 61. The start handler creates a POE::Wheel::FollowTail wheel, and stores it into the session's ``heap'' area, ensuring that it stays alive as long as the session is alive. Because we never destroy this session, this essentially means ``forever''. The heap key of wheel is arbitrary: we could have used mongoose with no consequences.

The wheel takes a few parameters, defined in lines 65 through 67. The Filename parameter is the file to be tailed. Here, we get the name directly from the configuration parameter $FILENAME defined at the top of the program. Because we've provided an explicit filename, this wheel is smart enough to close and re-open the file when the file finally changes, such as when we roll the web logs (like I do nightly). That's a lot better than my tail -f sessions, which seem to freeze around 2am, until I remember that the logs have been rolled!

The InputEvent defines a callback event that will be triggered whenever new data is available from the file. It appears to be called once per line, although that wasn't clear in the documentation.

Finally, SeekBack defines how many bytes of the file are initially processed, by seeking back from the end of the file. Note that this is a byte count and not a line-number boundary, so we might end up in the middle of the line for the first entry.

As new lines arrive (or initially while processing the seekback tail of the file), the event handler got_line is called, defined starting in line 70. Line 71 simply takes the line from $_[ARG0], prepends the current timestamp, and pushes the resulting arrayref onto the end of @data. Line 72 ensures that the array never exceeds $TAILING elements by trimming elements from the beginning of the array when necessary.

Finally, since POE is an event loop manager, we need to put the POE kernel ``in charge'' of the program. This is done in line 77. In this case, the loop never exits, because we always have a web server session and a followtail session, so the program never returns from this call.

So there you have it... a proof-of-concept program to inspire you to write more interesting and cool things with POE, especially now that you've seen how easy it is to create a tiny web server. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl
        =2=     use strict;
        =3=     
        =4=     ### configuration items ###
        =5=     my $FILENAME = "/web/stonehenge-proxy/var/log/access_log";
        =6=     my $TAILING = 16;
        =7=     my $URGENT = 5;                 # this many seconds stays white
        =8=     my $OLD = 55;                   # this many seconds after urgent is dark
        =9=     ### end configuration items ###
        =10=    
        =11=    my @data;
        =12=    use CGI qw(:html);
        =13=    use POE qw(Component::Server::HTTP Wheel::FollowTail);
        =14=    
        =15=    POE::Component::Server::HTTP->new
        =16=      (
        =17=       Port => 42042,
        =18=       ContentHandler =>
        =19=       {"/" => sub {
        =20=          my ($request, $response) = @_;
        =21=          
        =22=          $response->code(RC_OK);
        =23=          $response->content_type("text/html");
        =24=          $response->content
        =25=            (join "",
        =26=             start_html(
        =27=                        -title => "web tail on $FILENAME",
        =28=                        -bgcolor => 'black',
        =29=                        -head => ["<meta http-equiv=refresh content='5;URL=#bottom'>"],
        =30=                        ),
        =31=             table (map {
        =32=               my ($stamp, $message) = @$_;
        =33=               my $age = time - $stamp;
        =34=               my $color;
        =35=               if ($age < $URGENT) {
        =36=                 $color = 'white';
        =37=               } else {
        =38=                 $age -= $URGENT;
        =39=                 $age = $OLD if $age > $OLD;
        =40=                 my $c = 255 * (1 - $age / $OLD);
        =41=                 $color = sprintf '#%02x%02x%02x', $c, 255, $c;
        =42=               }
        =43=               Tr(td({valign => 'top'},
        =44=                     font({color => 'cyan'},
        =45=                          sprintf "%02d:%02d:%02d", (localtime $stamp)[2,1,0])),
        =46=                  td(font({color => $color, face => 'courier'},
        =47=                          escapeHTML($message))));
        =48=                } @data),
        =49=             a({name => "bottom"}),
        =50=             end_html,
        =51=            );
        =52=          return RC_OK;
        =53=        }},
        =54=      );
        =55=    
        =56=    
        =57=    POE::Session->create
        =58=      (
        =59=       inline_states =>
        =60=       {
        =61=        _start => sub {
        =62=          $_[HEAP]->{wheel} =
        =63=            POE::Wheel::FollowTail->new
        =64=                (
        =65=                 Filename => $FILENAME,
        =66=                 InputEvent => 'got_line',
        =67=                 SeekBack => 8192,
        =68=                );
        =69=        },
        =70=        got_line => sub {
        =71=          push @data, [time, $_[ARG0]];
        =72=          splice @data, 0, -$TAILING if @data > $TAILING;
        =73=        },
        =74=       },
        =75=      );
        =76=    
        =77=    $poe_kernel->run();

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.