Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 85 (Sep 2006)

[Suggested title: ``Searching my columns with POE and IRC'']

As I write this month's piece, I'll be creating my 237th article for various magazines, including 85 monthly articles for this magazine alone (since its inception). I've covered topics throughout my magazine writing history with as little overlap as I can muster. Because of this diversity, it's hard to find a beginner or intermediate Perl topic that I haven't already spoken at least a bit about. (And this clearly makes me the leader for in-print authorship for Perl writings... at somewhere around 20 million by-lines total.)

And thanks to the generous publishers I've had over the years, all 237 (and counting) magazine articles are on-line on my website, ready to be examined for free. The hard part is getting the word out about this resource. Oh sure, I have a ``search this site'' box on my website, hoping that people will take advantage of relative keywords. But IRC bots seem all the rage right now, and I thought it'd be nice to add yet another way to bring visibility to the columns.

I recently stumbled across the Yahoo::Search module to perform programmatic web searches using the decently sized and speedy Yahoo search engine. Now, I was previously familiar with Net::Google for google searching, but I've switched to using Yahoo search for a few reasons.

One advantage of Yahoo search for my columns is that I can actually search for Class::DBI. Google's search for words containing colons has been broken for about a year now, making it hard to look for Perl modules referenced in my columns (or anywhere, for that matter).

Another big advantage of using Yahoo search rather than Google search from programs is that I don't need to go through a formal process to get a very private Google API key. Instead, I can just make something up! Thank you, Yahoo, for making it easier for us!

And finally, to a reasonable degree, the results from Yahoo APIs have recently had the ``no commercial use'' clause removed from the acceptable use policy. This means that I can legally use the information in the pursuit of fortune as well as fame. For example, I have a friend who has a real estate site, and finds the closest branch office to an address using Yahoo's geolocation service... for free. Nice. Wake up Google... Yahoo is slipping past you here.

So, I thought I'd show off the Yahoo search, promote my columns, demonstrate the latest changes to POE::Component::IRC for bot building, and illustrate POE::Session::Attribute for easy session authorship, all in one little ``mash-up'', as the kids like to say these days. And with that flourish, let me bring in my search_merlyn_text bot, in [listing one, below].

Line 1 points the script at the right Perl program. Line 2 enables the usual compiler restrictions.

Lines 6 through 8 give a bit of configuration information. It's probably not flexible enough for general use, but close enough for my initial testing. Line 6 gives the IRC server to which the bot connects. Line 7 is the irc nick that the bot will use. Line 8 is a list of channels that the bot will attempt to join, to answer public queries.

Line 12 pulls in the POE module (found in the CPAN). The only symbol I'm using from this import is $poe_kernel, below.

Lines 14 to 130 create my bot, as if it were a separate file that I brought in with use MyBot. By placing the package inside a block, we limit any lexical variables (which would have been file-lexical variables) to that package definition. By making it a BEGIN block, any runtime statements are processed before the rest of the file is processed. This is useful if you have any fake exports of the form:

   *main::some_exported = \&some_exported;

which exports some_exported to the main package, allowing it to be called and not treated as a bareword.

Line 17 again imports POE, as well as POE::Component::IRC, including the state handler offsets like HEAP and ARG0.

Line 18 makes this a POE::Session subclass, but also scans each subroutine header for an attribute that can tag the subroutine automatically as a state for the session. I find this module to have been invaluable on recent projects, in the mindset of ``don't repeat yourself''. It always seemed silly to me that you'd have to say the name of everything twice, and POE::Session::Attribute puts it back into the ``once is enough'' mode.

Line 20 pulls in Yahoo::Search, using a generic application identifier of YahooDemo. If you use this module, you'll want to read the restrictions at http://developer.yahoo.com, including how to select and register your own application identifier. But really, you could just make up a unique name, and the Yahoo API accepts it. They are very unpicky, and therefore very friendly.

Line 22 defines a utility subroutine to convert the nick!user@host.example.com return value from some of the operations to just nick. Note that this is not a session state handler, because there are no attributes assigned.

Lines 24 to 27 define the handler for the _start state, automatically generated by POE when the session begins. This is where I create the POE::Component::IRC object, and stuff it into the heap so that it stays alive as long as this session does. Creating the object also causes a register 'all' request to be sent, meaning that we'll be notified of every possible IRC event.

Once the IRC object is operational, the irc_registered event is triggered, bringing us to the subroutine in lines 29 through 34. As a matter of formula, for every event triggered by the IRC object, I grab the sender (line 30) and get the IRC object back (line 31). By doing so, I can use this same code with multiple IRC objects in one program and it does the right thing, sending the response back to the right object. Line 33 tells the IRC object to try to connect to our requested server with our requested nick.

Presuming the connection worked OK, we'll get a ``255'' event at some point in the near future, which basically says that the server is done saying all of its initial login stuff, and you're now ready to go. We trap this in lines 36 to 41. Line 40 has the bot join all of the interesting channels.

As the bot joins each channel, we get an irc_join event back, which is handled in lines 43 to 54. We also get this event if someone else joins the channel that we're in, and we're not interested in those events right now. But if we get into a channel, we announce our intentions in line 52.

From this point on, everything is all merely reactions to being spoken to in various ways. For example, if we get a public message in a channel we're in, we'll get an irc_public event, handled in lines 56 to 68. If we get a private message, we'll see an irc_msg event instead, handled in lines 70 to 78. In either case, we have to see if it's time for us to do a search, and if so, perform the search and return the results.

For the public event, line 60 extracts the speaker, the channels, and the text of the message. If the message starts with our name, followed by a colon or comma (line 64), then we'll presume the speaker is actually talking to us, and perform a search, returning the response in the same public channel. We do this by yielding to the search state in our own session, including a prefix that identifies the speaker who triggered our search, in case multiple searches are fired off in a short period of time.

For a private message, we know that it's going to be a search, so we can skip the matching. Also, we'll be addressing the person directly, so we can skip the prefix. Thus, the yield in line 77 is a bit shorter. Note that the nick is placed inside an array ref constructor so that we can use the same interface for both calls.

Now, we'll get to the search handling in a minute, but as an aside, I also decided to decode the emotes, such as /me is stepping out for a moment in lines 80 to 87. I'm not actually doing anything with them at the moment, but maybe a future version of the bot will pay attention to the emotes as well.

Finally, we get to the meat of the program: the searching in lines 89 to 108. Line 90 just grabs the passed-in parameters so that we can access them.

Line 93 creates an array to hold the responses. We're only going to use 3 responses at most.

Lines 95 and 96 perform the Yahoo search, using a site-restricted search for stonehenge.com, plus whatever text the user specified. Note that we also ask for 20 results. Why 20 results when we're only going to use 3? Because we can't directly ask Yahoo for ``just the magazine columns'', so we'll have to perform an additional filtering based on the returned URL. That filtering is performed in line 97.

If we make it to line 100, we're looking at a good URL. The title and URL are wrapped in brackets, and placed into @results. If we have our three results, we abort the loop in line 101.

If for any reason we can't find any results, lines 103 to 104 change the results to suggest that they inform me of their search, so I can write an article about that. (I'm always looking for new ideas!) (Please.) (I'm not kidding.)

Line 107 puts the results back to the user for a private message, or the channel(s) for a public message, along with a prefix if needed.

And that's the functionality of the bot. Lines 110 to 129 establish a _default handler for debugging purposes. Because we're registered to receive all IRC events, we'll be getting a lot of events that we don't care about. However, it's nice to know what they are so I can add some hook into them if I want. The code in lines 124 to 127 dumps out the event, including expanding any arrayrefs to their array values for easy parsing.

Initially, the exception list in lines 113 to 121 was empty. As I ran the program repeatedly, I saw events being triggered that were either interesting, or ignorable. For interesting events, I immediately created a handler above. For ignorable events, I added the event type to the block list. And thus, my program was grown a bit at a time, helping me understand what I needed to do.

All that's left in the program is to create a bot, and run it. Lines 132 and 134 do exactly that. Because there's no trigger for getting the bot to stop, this program will run until terminated from the command line.

After I submit this column, I'll probably tinker with this bot a bit more to get it to handle nick collisions, being kicked from a channel, being invited to a channel, and so on. But for now, in a dozen dozen lines of code, I've got a working bot that helps promote one of my many public contributions to the Perl community. Hope you find what you're looking for... until next time, enjoy!

LISTING

        =1=     #!/usr/bin/perl
        =2=     use strict;
        =3=     
        =4=     ## CONFIGURATION
        =5=     
        =6=     my $SERVER = "irc.perl.org";
        =7=     my $NICK = "search_merlyn_text";
        =8=     my @CHANNELS = qw(#search_merlyn_text #search_merlyn_text2);
        =9=     
        =10=    ## END CONFIGURATION
        =11=    
        =12=    use POE;
        =13=    
        =14=    BEGIN {
        =15=      package MyBot;
        =16=    
        =17=      use POE qw(Component::IRC);
        =18=      use base POE::Session::Attribute::;
        =19=    
        =20=      use Yahoo::Search AppId => 'YahooDemo';
        =21=    
        =22=      sub full_to_nick { (shift =~ /(.*?)!/)[0] }
        =23=    
        =24=      sub _start : Package {
        =25=        my ($heap) = @_[HEAP];
        =26=        my $irc = $heap->{irc} = POE::Component::IRC->spawn;
        =27=      };
        =28=    
        =29=      sub irc_registered : Package { # client is ready to connect
        =30=        my ($sender) = @_[SENDER];
        =31=        my $irc = $sender->get_heap;
        =32=        ## warn "trying to connect";
        =33=        $irc->yield(connect => {server => $SERVER, nick => $NICK});
        =34=      };
        =35=    
        =36=      sub irc_255 : Package {       # server is done blabbering
        =37=        my ($sender) = @_[SENDER];
        =38=        my $irc = $sender->get_heap;
        =39=        ## warn "trying to join @CHANNELS";
        =40=        $irc->yield(join => $_) for @CHANNELS;
        =41=      };
        =42=    
        =43=      sub irc_join : Package {      # server says we joined
        =44=        my ($sender) = @_[SENDER];
        =45=        my $irc = $sender->get_heap;
        =46=    
        =47=        my ($who, $channel) = @_[ARG0..ARG1];
        =48=        $who = full_to_nick($who);
        =49=    
        =50=        ## warn "$who joined $channel";
        =51=        if ($who eq $NICK) {
        =52=          $irc->yield(privmsg => $channel => "Hello! I search merlyn's columns at http://www.stonehenge.com/merlyn/columns.html!";);
        =53=        }
        =54=      };
        =55=    
        =56=      sub irc_public : Package {    # public message in a channel
        =57=        my ($sender) = @_[SENDER];
        =58=        my $irc = $sender->get_heap;
        =59=    
        =60=        my ($who, $channels, $message) = @_[ARG0..ARG2];
        =61=        $who = full_to_nick($who);
        =62=        ## warn "$who said $message in @$channels";
        =63=    
        =64=        if ($message =~ /^\Q$NICK\E(?:,|:)\s*(.*)/) {
        =65=          my $search = $1;
        =66=          $_[KERNEL]->yield(search => $irc, $channels, $search, "$who: ");
        =67=        }
        =68=      };
        =69=    
        =70=      sub irc_msg : Package {       # private message to me
        =71=        my ($sender) = @_[SENDER];
        =72=        my $irc = $sender->get_heap;
        =73=    
        =74=        my ($who, $me, $message) = @_[ARG0..ARG2];
        =75=        $who = full_to_nick($who);
        =76=        ## warn "$who said $message to me";
        =77=        $_[KERNEL]->yield(search => $irc, [$who], $message);
        =78=      };
        =79=    
        =80=      sub irc_ctcp_action : Package {       # public emote
        =81=        my ($sender) = @_[SENDER];
        =82=        my $irc = $sender->get_heap;
        =83=    
        =84=        my ($who, $channels, $message) = @_[ARG0..ARG2];
        =85=        $who = full_to_nick($who);
        =86=        ## warn "$who *$message* in @$channels";
        =87=      };
        =88=    
        =89=      sub search : Package {        # time for us to search
        =90=        my ($irc, $channels, $search, $prefix) = @_[ARG0..ARG3];
        =91=        ## warn "@$channels wants to see $prefix [results of $search]";
        =92=    
        =93=        my @results;
        =94=    
        =95=        for my $result (Yahoo::Search->Results
        =96=                        (Doc => "site:stonehenge.com $search", Count => 20)) {
        =97=          next unless $result->Url =~ /col\d\w*\.html$/;
        =98=    
        =99=          ## return results as mediawiki-like links
        =100=         push @results, sprintf "[%s %s]", $result->Url, $result->Title;
        =101=         last if @results >= 3;
        =102=       }
        =103=       @results = "Nothing.  Please let Randal know of your great column idea!"
        =104=         unless @results;
        =105=   
        =106=       $prefix ||= "";
        =107=       $irc->yield(privmsg => $channels => "${prefix}$search: @results");
        =108=     };
        =109=   
        =110=     sub _default : Package {
        =111=       our $IGNORE_THESE ||= { map { $_ => 1 }
        =112=                               qw(_child
        =113=                                  irc_plugin_add irc_isupport
        =114=                                  irc_snotice irc_connected
        =115=                                  irc_mode irc_ping irc_part
        =116=                                  irc_001 irc_002 irc_003 irc_004 irc_005
        =117=                                  irc_250 irc_251 irc_252 irc_254 irc_255
        =118=                                  irc_265 irc_266
        =119=                                  irc_353
        =120=                                  irc_366
        =121=                                  irc_372 irc_375 irc_376
        =122=                                 ) };
        =123=       return if $IGNORE_THESE->{$_[ARG0]};
        =124=       printf "%s: session %s caught an unhandled %s event.\n",
        =125=         scalar localtime(), $_[SESSION]->ID, $_[ARG0];
        =126=       print "The $_[ARG0] event was given these parameters: ",
        =127=         join(" ", map({"ARRAY" eq ref $_ ? "[@$_]" : "$_"} @{$_[ARG1]})), "\n";
        =128=       0;                        # false for signals
        =129=     };
        =130=   }
        =131=   
        =132=   MyBot->spawn;
        =133=   
        =134=   $poe_kernel->run;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.