Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Perl Journal magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Perl Journal Column 09 (Feb 2004)

[Suggested title: ``Watching a logfile in an IRC channel'']

The other day, I was hanging out on IRC when Keven Lenzo (creator of the YAPC conference and current director of The Perl Foundation) popped in and asked if anyone had written a tool to dribble a slowly-growing file into an IRC channel. I jumped right in and said ``that can be done rather simply with POE'', and realized that I hadn't actually done anything quite like that yet.

In this column last year [May 2003], I had written a file watcher that provided a web interface, but not a file watcher that would write to IRC, and since I hadn't yet played much with the POE IRC component, I grabbed the ball and ran with it.

As an interesting part of the problem, I had to figure out how to deal with IRC's anti-flood provisions. Most IRC servers these days will boot off any IRC client that sends too much text at once, and the POE IRC component throttles messages by default, dribbling them out at a rate that appears to be about 3 messages every 10 seconds. But a logfile that is being watched might grow faster than that. It wouldn't make sense for my bot to just spew all the lines blindly, getting more and more behind as time went on, so I had to have my code figure out when I was being throttled. Luckily, since the code is all open-source, I was able to peer inside and figure out how to adapt. The result is in [listing one, below].

Lines 1 through 3 are my standard preamble, turning on warnings, enabling ``big program'' compiler restrictions, and disabling buffering on standard output.

Lines 5 through 22 delimit the ``user serviceable parts'' from the rest of the file. Line 7 is the IRC nickname to be used by this bot. Lines 8 through 12 define to which server the bot connects, and the bot's ``real name'', in a manner consistent with the POE IRC component's initialization parameters. Line 13 defines the channel that the bot will join, sending public messages to that channel. Line 14 defines an internal alias for the virtual IRC client created by the POE IRC Component.

Lines 16 to 20 define the files that the bot will be watching. Each key should be a short identifying string that will show up in the bot's output, and the value the corresponding file to be tailed. In this case, I'm watching my webserver's access and error log files.

Lines 24 and 25 are used in the throttling process, described later. They have to be defined outside the handler that uses them so that they will persist between handler invocations.

Line 27 pulls in POE, as well as the POE components for IRC and for following a file.

Line 29 creates the virtual IRC client, giving the session the name defined in $IRC_ALIAS.

Lines 31 through 119 define a second session. If the first session is like the IRC client, the second session being created here is like the virtual human running the client. As such, we'll have steps here to cause the IRC client to connect to the server, and then react as the IRC client notices various conditions. This session doesn't have a name. For the most part, the session's behavior is defined by a series of inline states, beginning in line 33.

Lines 33 to 36 define the actions taken by this session as it is starting up. The session pokes at the virtual IRC client to have it respond to all IRC events, and then it tries to connect the IRC client to its server.

If the connection is successful, eventually an irc_255 event is posted back to the virtual-human session (at the end of all the connecting messages), triggering the handler beginning in line 37. This handler tells the IRC client to join a particular channel (line 38), and then start up the heartbeat and listeners (lines 39 and 40). The details of the startup are given later.

Lines 42 through 49 cause null handlers to be installed for many common but uninteresting IRC events. This list started out empty, but then after running my bot for a while, I figured out more things that I should ignore (because they were making noise in my default handler), and added them here.

Similarly, line 50 keeps _child events from hitting the default handler, because I have nothing to do for that either.

Lines 51 to 57 define a _default event handler, mostly so that I could diagnose all the events being sent to my virtual human, including dumping the data in a nice way.

Up to this point, we have a fairly generic but stupid IRC bot that knows how to connect to a server and join a channel. The remaining events give the bot its personality. The my_add handler (defined starting in line 58), tells the bot to start watching a particular file. We'll do this by adding a watcher session using POE::Wheel::FollowTail.

First, lines 59 and 60 set up two closure variables: the key into the %FOLLOWS hash, and the session ID of the virtual human session. Lines 61 to 76 create the watcher session, watching the filename from the %FOLLOWS hash (in line 67), and handling events at each new noticed line via the got_line handler (line 68).

The got_line handler posts an event back to the virtual human session to the my_tailed handler, including the timestamp, short identifying word, and the line of text that was seen. As each new line is noticed by the POE wheel, we'll get another event in sequence, carefully timestamped and labeled appropriately.

Although this strategy makes one kind of logical sense (having a separate session for each wheel), at the cost of some complexity I could have also done the work entirely with one session. I usually go for the dirt-simple methods first, though, and it worked for me this way.

Now that we have line-events flowing in, it's time to handle them, using the handler starting in line 79. This handler is normally called with the timestamp (integer), the file, and the line of text, which are extracted in line 80. However, under circumstances described shortly, this handler may also be called with no parameters, leaving these three variables undef.

Because I wanted this virtual human to be aware of the built-in throttling of the virtual IRC client (to avoid the heavy-handed throttling of the real IRC server), I have to reach ``under the hood'' in lines 85 and 86. By staring at the source code for POE::Component::IRC, I was able to determine that the arrayref stored in its heap at send_queue contained one element per deferred IRC message. If the array is empty, then the next message I send will most likely be sent immediately to the IRC server. I cache this arrayref in the outer $SEND_QUEUE variable just to avoid that bizarre sequences of steps to find it each time.

Lines 89 to 111 implement the throttling-aware transmission. I chose to implement a bit of hysteresis in the design. When the queued elements reach three or more, I start discarding, and keep discarding until the queue is completely empty. With this strategy, the discarded lines tend to come in clumps, rather than throwing away every other line or every third line. I chose three as the threshold because it appears that it takes about 10 seconds to dump 3 average-sized messages.

The variable $SKIPPING keeps track of this state. If it's 0, we're just pushing thing through as they arrive. Otherwise, it's the number of messages that we've had to discard because we're skipping.

Lines 89 to 93 handle the ``skipping to non-skipping'' transition. If we're currently skipping, but the queue is now empty, then report how many messages were discarded by sending a message to the channel (lines 90 and 91) and then resetting the $SKIPPING variable.

Lines 96 to 106 execute only if we have a line to display, verified by a non-false $time value in line 96.

Lines 97 to 98 handle the transition from non-skipping to skipping if needed. If we're already skipping, or the queue has now exceeded three messages, then we discard the line and note that it has been discarded (line 98).

Otherwise, it's time to say something. The timestamp is expanded to the time components (seconds, minutes, hours, and so on) in line 100. Line 101 sends a public message to the channel, labeled with the timestamp in HH:MM:SS 12-hr format and the file from which the line comes. The fancy bit of math in line 103 converts 0-23 into 12,1-12,1-11 as needed.

Lines 109 to 111 cause this handler to be reinvoked automatically every half second if we're currently skipping. The purpose of this action is to display the discarded N messages message as quickly as possible, instead of waiting for more input to come from the file. Note that we reinvoke the handler, but pass no parameters, which will have $time be false and the middle portion of this handler to be skipped. Using $_[STATE] here is a shortcut meaning ``the same handler as the one we're in''.

Finally, lines 114 to 117 define the ``heartbeat'' task. Every 10 seconds, a line that looks like HH:MM:SS: heartbeat: beep is shown in the channel. I did this for testing before I had the file-tailing added, but I like how it keeps channel occupants aware that there's stuff coming in from the bot.

Once the POE event handlers and sessions are created, the final line of this program (line 121) tells POE to ``start the event loop'', and away we go. This program never exits, because there are always sessions that are alive, but a quick INT character (usually control-C), takes care of the appropriate termination. The bot rudely disconnects from the IRC server: if I had been more concerned, I could have set up a _signal handler to send a proper quit message to the IRC server with an explanation.

Hopefully, you've seen how easy it is to construct an IRC bot. I must caution you though that many people (including me) despise noisy or spooky bots, so have some common sense when you set these up. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     ## CONFIG
        =6=     
        =7=     my $NICK = 'weblogger';
        =8=     my $CONNECT =
        =9=       {Server => 'irc.some.borg',
        =10=       Nick => $NICK,
        =11=       Ircname => 'weblogger: see merlyn@stonehenge.com',
        =12=      };
        =13=    my $CHANNEL = '#weblogger';
        =14=    my $IRC_ALIAS = "irk";
        =15=    
        =16=    my %FOLLOWS =
        =17=      (
        =18=       ACCESS => "/var/log/access_log",
        =19=       ERROR => "/var/log/error_log",
        =20=      );
        =21=    
        =22=    ## END CONFIG
        =23=    
        =24=    my $SKIPPING = 0;               # if skipping, how many we've done
        =25=    my $SEND_QUEUE;                 # cache
        =26=    
        =27=    use POE qw(Component::IRC Wheel::FollowTail);
        =28=    
        =29=    POE::Component::IRC->new($IRC_ALIAS);
        =30=    
        =31=    POE::Session->create
        =32=      (inline_states =>
        =33=       {_start => sub {      
        =34=          $_[KERNEL]->post($IRC_ALIAS => register => 'all');
        =35=          $_[KERNEL]->post($IRC_ALIAS => connect => $CONNECT);
        =36=        },
        =37=        irc_255 => sub {            # server is done blabbing
        =38=          $_[KERNEL]->post($IRC_ALIAS => join => $CHANNEL);
        =39=          $_[KERNEL]->yield("heartbeat"); # start heartbeat
        =40=          $_[KERNEL]->yield("my_add", $_) for keys %FOLLOWS;
        =41=        },
        =42=        (map
        =43=         {
        =44=           ;"irc_$_" => sub { }}
        =45=         qw(join public
        =46=            connected snotice ctcp_action ping notice mode part quit
        =47=            001 002 003 004 005
        =48=            250 251 252 253 254 265 266
        =49=            332 333 353 366 372 375 376)),
        =50=        _child => sub {},
        =51=        _default => sub {
        =52=          printf "%s: session %s caught an unhandled %s event.\n",
        =53=            scalar localtime(), $_[SESSION]->ID, $_[ARG0];
        =54=          print "The $_[ARG0] event was given these parameters: ",
        =55=            join(" ", map({"ARRAY" eq ref $_ ? "[@$_]" : "$_"} @{$_[ARG1]})), "\n";
        =56=          0;                        # false for signals
        =57=        },
        =58=        my_add => sub {
        =59=          my $trailing = $_[ARG0];
        =60=          my $session = $_[SESSION];
        =61=          POE::Session->create
        =62=              (inline_states =>
        =63=               {_start => sub {
        =64=                  $_[HEAP]->{wheel} =
        =65=                    POE::Wheel::FollowTail->new
        =66=                        (
        =67=                         Filename => $FOLLOWS{$trailing},
        =68=                         InputEvent => 'got_line',
        =69=                        );
        =70=                },
        =71=                got_line => sub {
        =72=                  $_[KERNEL]->post($session => my_tailed =>
        =73=                                   time, $trailing, $_[ARG0]);
        =74=                },
        =75=               },
        =76=              );
        =77=        
        =78=        },
        =79=        my_tailed => sub {
        =80=          my ($time, $file, $line) = @_[ARG0..ARG2];
        =81=          ## $time will be undef on a probe, or a time value if a real line
        =82=    
        =83=          ## PoCo::IRC has throttling built in, but no external visibility
        =84=          ## so this is reaching "under the hood"
        =85=          $SEND_QUEUE ||=
        =86=            $_[KERNEL]->alias_resolve($IRC_ALIAS)->get_heap->{send_queue};
        =87=    
        =88=          ## handle "no need to keep skipping" transition
        =89=          if ($SKIPPING and @$SEND_QUEUE < 1) {
        =90=            $_[KERNEL]->post($IRC_ALIAS => privmsg => $CHANNEL =>
        =91=                             "[discarded $SKIPPING messages]");
        =92=            $SKIPPING = 0;
        =93=          }
        =94=    
        =95=          ## handle potential message display
        =96=          if ($time) {
        =97=            if ($SKIPPING or @$SEND_QUEUE > 3) { # 3 msgs per 10 seconds
        =98=              $SKIPPING++;
        =99=            } else {
        =100=             my @time = localtime $time;
        =101=             $_[KERNEL]->post($IRC_ALIAS => privmsg => $CHANNEL =>
        =102=                              sprintf "%02d:%02d:%02d: %s: %s",
        =103=                              ($time[2] + 11) % 12 + 1, $time[1], $time[0],
        =104=                              $file, $line);
        =105=           }
        =106=         }
        =107=   
        =108=         ## handle re-probe/flush if skipping
        =109=         if ($SKIPPING) {
        =110=           $_[KERNEL]->delay($_[STATE] => 0.5); # $time will be undef
        =111=         }
        =112=   
        =113=       },
        =114=       my_heartbeat => sub {
        =115=         $_[KERNEL]->yield(my_tailed => time, "heartbeat", "beep");
        =116=         $_[KERNEL]->delay($_[STATE] => 10);
        =117=       }
        =118=      },
        =119=     );
        =120=   
        =121=   POE::Kernel->run;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.