Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 01 (Jun 1999)

[suggested title: Using things from the CPAN]

The Perl community is one of the most well-established demonstrations of the Open Software movement. As a result, many people that have benefitted from Perl's opennness have in turn contributed libraries and scripts back to the public for others to use. The collective contribution for the Perl community has been organized into the Comprehensive Perl Archive Network, known more commonly as the CPAN.

The CPAN is not a single machine, but actually over 100 machines (at last count) holding the same 750 megabytes (at last count) of cool stuff, available for your access via anonymous FTP or in a few cases HTTP. They all mirror off the master site in Finland at various intervals, usually not exceeding 4 to 8 hours, so it's usally safe to use the nearest CPAN archive for you. If you're in the US, visit http://www.cpan.org. If that site is down (which it occasionally is), or you're somewhere else in the world, use the URL of http://www.perl.com/CPAN/, which will hopefully send you to an up-and-running site that is close to you.

There, you'll find all sorts of useful things, including the latest release of Perl for Unix and other operating systems. But in particular, you'll find hundreds of pre-written and tested modules that handle many common tasks.

Let's look at two of those modules, and how we might use them in an interesting way together. The Net::IRC module provides support for Perl to handle the Internet Relay Chat protocol. With this module, you can write a bot -- a Perl program that connects to an IRC server as if it was an IRC client, and then interacts with that server in programmatic ways.

And, then there's one of my favorite interesting modules, Chatbot::Eliza, which describes itself as:

This module implements the classic Eliza algorithm. The original Eliza program was written by Joseph Weizenbaum and described in the Communications of the ACM in 1966. Eliza is a mock Rogerian psychotherapist. It prompts for user input, and uses a simple transformation algorithm to change user input into a follow-up question. The program is designed to give the appearance of understanding.

This program is a faithful implementation of the program described by Weizenbaum. It uses a simplified script language (devised by Charles Hayden). The content of the script is the same as Weizenbaum's.

So, a few days ago, I wondered, ``what would happen if I took the Net::IRC module and hooked it into Chatbot::Eliza, to make a bot that acts like a doctor?''. Well, it was easier than I thought, and a good example of how to reuse existing code from the CPAN.

Installing these modules are trivial, because there's a standard module that installs other modules directly, called CPAN.pm. To install the two modules needed in this program, just do the following:

        $ perl -MCPAN -eshell
        cpan> install Net::IRC Chatbot::Eliza
        [cpan installation stuff omitted]
        cpan> quit
        $

And that's all there is to it! This invocation will fetch the module source from the nearest CPAN archive, and unpack, configure, make, test, and install them. You have to be the same user as the user that initially installed Perl; if you're not, invoke perldoc CPAN to get further instructions.

So, let's take a look at this ``eliza-bot'' in [listing one, below].

Line 1 starts the program, giving the path to your installed Perl binary (here, /usr/bin/perl for me), and turning on warnings (a good idea during development, but should always be turned of in production code).

Line 2 enables the three common compiler restrictions. These restrictions disable the use of soft references (always a good idea), disable barewords being treated like strings (keeps random typos from being ignored), and requires all variables to be either explict package variables or predeclared with my to make them lexical. As all three of these restrictions make sense for programs longer than say, 10 lines, I usually enable them all like I've done here.

Line 3 increments the value of $|. If you do this at the beginning of the program, the STDOUT filehandle becomes unbuffered instead of buffered. Here, that's a good thing, because we want to see the result of each print operation as it's being generated.

Lines 5 and 6 pull in the Net::IRC and <Chatbot::Eliza> modules, as described earlier. These modules must be locatable in the directories specified in the compile-time @INC value. If you need to add directories to @INC, see the documentation for the lib module (by invoking perldoc lib).

Lines 8 and 9 define two configuration constants for this program, each with a true/false value. The $IRC_debug value is passed to the Net::IRC module as a debug selection. If false, Net::IRC stays relatively quiet. But if it's a true value, we get a fairly detailed tracing of all the steps and callbacks. I set this to 1 while I was figuring out which callbacks would be triggered at what times with what parameters. But it's completely too noisy during the real operation, so I've set it to 0 here. And $TRACE just selects whether summary messages of everything the bot is doing gets sent to standard output. These are pretty small, and can probably remain enabled during actual use, as I've done here.

Line 11 defines a new Net::IRC object, assigns it to $irc, and sets the debugging value for this object according to the value of $IRC_debug, as defined above. This object represents the supervisor of all connections to various IRC servers for this particular program.

Lines 12 through 15 create an individual IRC connection. In this case, it's the sole connection for this bot, connecting as nickname eliza000 to the IRC server at the fictional host name random.place.not. Of course, to be useful, you'll need to find an IRC server on the appropriate IRC network that supports the use of bots. (Most don't.) When we've come back from the newconn method, $conn holds a Net::IRC::Connection object -- our main interface for sending and receiving information.

The primary means of scripting a conversation with Net::IRC is to establish a series of event handlers. Each handler is associated with a particular event (transformed into a Net::IRC::Event object). When the event is seen (because of something the IRC server has sent), the corresponding handler is called, along with some parameters (the connection object, and the event object). So, if you want to know every time someone sends a public message to a channel the bot has joined, you'll set up an event for public.

Lines 16 through 44 set up the particular event handlers that we need for this bot. I'm using a foreach loop here (spelled f-o-r but pronounced foreach) to walk $_ through a list of arrayrefs. Each arrayref in turn contains two items: either a single word or another arrayref, followed by a coderef. The body of the foreach loop (in line 43) passes these arrayrefs as parameters to the add_global_handler method on the connection, resulting in installing a particular coderef as an event handler for one or more named events.

So, let's look at each of the event handlers. Each one of these subroutines will be called with two parameters: the connection object itself, and the event object.

The first one, in lines 16 through 18, handle the motd event, triggered when we get a single line of text as part of the IRC signon message. To get that line of text, we start with the arguments to the subroutine, and take the second element of that $_[1], which is the event object. Then, we call the args method on that object, yielding a list, of which the second element is the line of text. Lots of indirection, but at least it's in there somewhere. Finally, we dump that line out if we're tracing.

After the motd is complete, we'll get an endofmotd event, which is about the first time that we can start issuing commands and messages. So, I register a handler for that in lines 19 through 23. Line 20 grabs the connection object into a local $conn. This is likely to be the same value as the global $conn, but I'm following safe protocol here by taking it from the argument list. Line 21 prints out a trace to let me know I'm all the way in. Line 22 invokes the equivalent of the IRC command /join #doctors_office, letting us start receiving public messages for that channel. If the join failed, a later event will tell me (which I'm ignoring), so there's no error checking to be done right here.

Lines 24 through 31 handle a nickname collision. The heart of this handler is line 28, which uses Perl's magical autoincrement to change eliza000 into eliza001 and then eliza002 and so on, until we get one that isn't taken. This works only if the attempted nick is of a form that Perl can autoincrement, verified in line 27.

Lines 32 through 36 handle all private messages, such as when someone says /msg eliza000 hi. In this handler, we'll extract the nickname from which the message came, and the message itself, and pass those along with the original connection and event objects to a subroutine heard, defined below.

Similarly, lines 37 through 40 handle all private messages, when someone says a message in a channel that this bot has joined (in this case, #doctors_office). Here, I'll call the same heard subroutine, but pass along the channel name as the source, rather than the nick name.

Line 45 is the top level loop. This method invocation never returns, until there's no more IRC protcol happening (usually after a user gets booted off the server, or issues a quit command). So, once we've started, everything else has to be handled within the context of the various event handlers. Since we established those up above, they'll now be called as things come in.

Lines 47 through 74 define the heard subroutine. This subroutine has two local variables that persist throughout the life of the program, defined in lines 48 and 49. %docs is the hash mapping a particular username to a Chatbot::Eliza object (a ``doc''). And %talking_to keeps track of the nickname to which messages are being sent, so that a fairly natural-looking nickname prefix can be added when the doctor changes the way she is ``facing'' in the channel.

Line 52 extracts the four parameters being handed to the heard subroutine: the connection object, the event object, a nick or channel name from which the message came, and the message itself. And line 53 dumps that out if we're tracing.

Lines 54 through 57 establish a magic phrase to make the doctor go away nicely. If any private or public message contains go away, then we'll request a quit (with a cute line from Elton John's Rocketman song), and return before any further processing.

Otherwise, it's time to hand the line being spoken to a doctor. Now, the doctor memory of prior spoken lines needs to be maintained on a per-user basis, so the easiest way to do this is to take the user information, in line 58, and look it up in a hash of doctors (%docs), in line 59 to get a specific doctor into $doc. Now, if this is the first time the bot has seen a particular user, we'll need to create a doctor object, in lines 60 through 62. Here, we'll invoke the new method in Chatbot::Eliza to establish a new doctor.

Now comes the slightly embarassing part. While I was playing with this program, I found that the version of Eliza on the CPAN for the past six months (version 0.40) has had a bug in it that causes the memories for various bots to all be shared. And that caused some weird responses -- someone would ask about dogs, and another person would be told ``Earlier, you said something about dogs'', ruining some of the illusion. I've emailed the author with suggestions about a bugfix, but in the meanwhile, I've got to workaround it with line 61. Pay no attention to the man behind the curtain here... if you're using a version greater than 0.40, most likely you won't need to do this.

Once we have a doctor (either new, or returning), we'll tell her what this user said, in line 64, and get her reply. Line 65 grabs the nickname of the user that originally spoke the line, and lines 66 through 69 figure out if we're already talking to that person or not. If not, line 68 prefixes the first line of the response with the nickname followed by a comma, a common convention. Subsequent messages will no longer be prefixed, because the nickname in $nick will match the value of $talking_to{$from}, the most recently spoken-to nickname in a particular channel.

Finally, lines 70 through 72 speak all the response lines in sequence, using the subroutine say defined later.

Lines 76 through 84 handle the response. Initially, I just used privmsg responses for everything, but this violates the agreement that bots are never to use privmsgs to people, because it can trigger a return response that could get into an auto-response meltdown. (The exact rule is that bots are never to listen to notice messages, and are never to generate anything except notice messages, but the rules are somewhat relaxed for public channel messages.) So, I look at who or what I'm talking to, and if it's a channel, it's a privmsg (which is the normal way to send stuff to a channel), and if it's not, it must be a user, so I send a notice.

And there you have it. A simple demonstration of two neat things off the CPAN, put together to provide hours of enjoyment. I had one beta tester just spend an hour and a half talking to the doctor while I was writing this. I guess some people don't have anything better to do. Until next time, enjoy!

Listing One

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use Net::IRC;
        =6=     use Chatbot::Eliza;
        =7=     
        =8=     my $IRC_debug = 0;
        =9=     my $TRACE = 1;
        =10=    
        =11=    (my $irc = Net::IRC->new)->debug($IRC_debug);
        =12=    my $conn = $irc->newconn(
        =13=                             Nick => 'eliza000',
        =14=                             Server => 'random.place.not',
        =15=                             );
        =16=    for ([motd => sub {
        =17=            print "motd: ".($_[1]->args)[1], "\n" if $TRACE;
        =18=          }],
        =19=         [endofmotd => sub {
        =20=            my $conn = shift;
        =21=            print "we are IN!\n" if $TRACE;
        =22=            $conn->join("#doctors_office");
        =23=          }],
        =24=         [nicknameinuse => sub {
        =25=            my $conn = shift;
        =26=            my $nick = $conn->nick;
        =27=            $nick =~ /^[a-zA-Z]+[0-9]*$/ or die "can't fix collided nick";
        =28=            $nick++;
        =29=            print "nick collision, fixing to $nick\n" if $TRACE;
        =30=            $conn->nick($nick);
        =31=          }],
        =32=         [msg => sub {
        =33=            my ($conn, $event) = @_;
        =34=            my ($msg) = $event->args;
        =35=            heard($conn, $event, $event->nick, $msg);
        =36=          }],
        =37=         [public => sub {
        =38=            my ($conn, $event) = @_;
        =39=            my ($msg) = $event->args;
        =40=            heard($conn, $event, $event->to, $msg);
        =41=          }],
        =42=        ){
        =43=      $conn->add_global_handler(@$_);
        =44=    }
        =45=    $irc->start;
        =46=    
        =47=    BEGIN {
        =48=      my %docs;
        =49=      my %talking_to;
        =50=    
        =51=      sub heard {
        =52=        my ($conn,$event,$from,$said) = @_;
        =53=        print "heard $from say $said\n" if $TRACE;
        =54=        if ($said =~ /go away/) {
        =55=          $conn->quit("o/~ and all the science, I don't understand... it's just my job five days a week o/~");
        =56=          return;
        =57=        }
        =58=        my $userhost = $event->userhost;
        =59=        my $doc = $docs{$userhost} ||= do {
        =60=          my $bot = Chatbot::Eliza->new();
        =61=          $bot->{memory} = [];      # bug workaround
        =62=          $bot;
        =63=        };
        =64=        my @response = $doc->transform($said);
        =65=        my $nick = $event->nick;
        =66=        if (($talking_to{$from} || "") ne $nick) {
        =67=          $talking_to{$from} = $nick;
        =68=          $response[0] = "$nick, $response[0]";
        =69=        }
        =70=        for (@response) {
        =71=          say($conn, $from, $_);
        =72=        }
        =73=      }
        =74=    }
        =75=    
        =76=    sub say {
        =77=      my ($conn, $to, $what) = @_;
        =78=      print "telling $to $what\n" if $TRACE;
        =79=      if ($to =~ /^\#/) {           # a channel
        =80=        $conn->privmsg($to, $what);
        =81=      } else {                      # a person
        =82=        $conn->notice($to, $what);
        =83=      }
        =84=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.