Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Linux Magazine Column 85 (Sep 2006)
[Suggested title: ``Searching my columns with POE and IRC'']
As I write this month's piece, I'll be creating my 237th article for various magazines, including 85 monthly articles for this magazine alone (since its inception). I've covered topics throughout my magazine writing history with as little overlap as I can muster. Because of this diversity, it's hard to find a beginner or intermediate Perl topic that I haven't already spoken at least a bit about. (And this clearly makes me the leader for in-print authorship for Perl writings... at somewhere around 20 million by-lines total.)
And thanks to the generous publishers I've had over the years, all 237 (and counting) magazine articles are on-line on my website, ready to be examined for free. The hard part is getting the word out about this resource. Oh sure, I have a ``search this site'' box on my website, hoping that people will take advantage of relative keywords. But IRC bots seem all the rage right now, and I thought it'd be nice to add yet another way to bring visibility to the columns.
I recently stumbled across the Yahoo::Search
module to perform
programmatic web searches using the decently sized and speedy Yahoo
search engine. Now, I was previously familiar with Net::Google
for
google searching, but I've switched to using Yahoo search for a few
reasons.
One advantage of Yahoo search for my columns is that I can actually
search for Class::DBI
. Google's search for words containing colons
has been broken for about a year now, making it hard to look for Perl
modules referenced in my columns (or anywhere, for that matter).
Another big advantage of using Yahoo search rather than Google search from programs is that I don't need to go through a formal process to get a very private Google API key. Instead, I can just make something up! Thank you, Yahoo, for making it easier for us!
And finally, to a reasonable degree, the results from Yahoo APIs have recently had the ``no commercial use'' clause removed from the acceptable use policy. This means that I can legally use the information in the pursuit of fortune as well as fame. For example, I have a friend who has a real estate site, and finds the closest branch office to an address using Yahoo's geolocation service... for free. Nice. Wake up Google... Yahoo is slipping past you here.
So, I thought I'd show off the Yahoo search, promote my columns,
demonstrate the latest changes to POE::Component::IRC
for bot
building, and illustrate POE::Session::Attribute
for easy session
authorship, all in one little ``mash-up'', as the kids like to say these
days. And with that flourish, let me bring in my
search_merlyn_text
bot, in [listing one, below].
Line 1 points the script at the right Perl program. Line 2 enables the usual compiler restrictions.
Lines 6 through 8 give a bit of configuration information. It's probably not flexible enough for general use, but close enough for my initial testing. Line 6 gives the IRC server to which the bot connects. Line 7 is the irc nick that the bot will use. Line 8 is a list of channels that the bot will attempt to join, to answer public queries.
Line 12 pulls in the POE
module (found in the CPAN). The only
symbol I'm using from this import is $poe_kernel
, below.
Lines 14 to 130 create my bot, as if it were a separate file that I
brought in with use MyBot
. By placing the package inside a block,
we limit any lexical variables (which would have been file-lexical
variables) to that package definition. By making it a BEGIN
block,
any runtime statements are processed before the rest of the file is
processed. This is useful if you have any fake exports of the form:
*main::some_exported = \&some_exported;
which exports some_exported
to the main
package, allowing it to
be called and not treated as a bareword.
Line 17 again imports POE
, as well as POE::Component::IRC
,
including the state handler offsets like HEAP
and ARG0
.
Line 18 makes this a POE::Session
subclass, but also scans each
subroutine header for an attribute that can tag the subroutine
automatically as a state for the session. I find this module to have
been invaluable on recent projects, in the mindset of ``don't repeat
yourself''. It always seemed silly to me that you'd have to say the
name of everything twice, and POE::Session::Attribute
puts it back
into the ``once is enough'' mode.
Line 20 pulls in Yahoo::Search
, using a generic application
identifier of YahooDemo
. If you use this module, you'll want to
read the restrictions at http://developer.yahoo.com
, including how
to select and register your own application identifier. But really,
you could just make up a unique name, and the Yahoo API accepts it.
They are very unpicky, and therefore very friendly.
Line 22 defines a utility subroutine to convert the
nick!user@host.example.com
return value from some of the operations
to just nick
. Note that this is not a session state handler,
because there are no attributes assigned.
Lines 24 to 27 define the handler for the _start
state,
automatically generated by POE
when the session begins. This is
where I create the POE::Component::IRC
object, and stuff it into
the heap so that it stays alive as long as this session does. Creating
the object also causes a register 'all'
request to be sent, meaning
that we'll be notified of every possible IRC event.
Once the IRC object is operational, the irc_registered
event is
triggered, bringing us to the subroutine in lines 29 through 34. As a
matter of formula, for every event triggered by the IRC object, I grab
the sender (line 30) and get the IRC object back (line 31). By doing
so, I can use this same code with multiple IRC objects in one program
and it does the right thing, sending the response back to the right
object. Line 33 tells the IRC object to try to connect to our
requested server with our requested nick.
Presuming the connection worked OK, we'll get a ``255'' event at some point in the near future, which basically says that the server is done saying all of its initial login stuff, and you're now ready to go. We trap this in lines 36 to 41. Line 40 has the bot join all of the interesting channels.
As the bot joins each channel, we get an irc_join
event back, which
is handled in lines 43 to 54. We also get this event if someone else
joins the channel that we're in, and we're not interested in those
events right now. But if we get into a channel, we announce our
intentions in line 52.
From this point on, everything is all merely reactions to being spoken
to in various ways. For example, if we get a public message in a
channel we're in, we'll get an irc_public
event, handled in lines
56 to 68. If we get a private message, we'll see an irc_msg
event
instead, handled in lines 70 to 78. In either case, we have to see if
it's time for us to do a search, and if so, perform the search and
return the results.
For the public event, line 60 extracts the speaker, the channels, and
the text of the message. If the message starts with our name,
followed by a colon or comma (line 64), then we'll presume the speaker
is actually talking to us, and perform a search, returning the
response in the same public channel. We do this by yielding to the
search
state in our own session, including a prefix that
identifies the speaker who triggered our search, in case multiple
searches are fired off in a short period of time.
For a private message, we know that it's going to be a search, so we can skip the matching. Also, we'll be addressing the person directly, so we can skip the prefix. Thus, the yield in line 77 is a bit shorter. Note that the nick is placed inside an array ref constructor so that we can use the same interface for both calls.
Now, we'll get to the search handling in a minute, but as an aside, I
also decided to decode the emotes, such as /me is stepping out
for a moment
in lines 80 to 87. I'm not actually doing anything with
them at the moment, but maybe a future version of the bot will pay
attention to the emotes as well.
Finally, we get to the meat of the program: the searching in lines 89 to 108. Line 90 just grabs the passed-in parameters so that we can access them.
Line 93 creates an array to hold the responses. We're only going to use 3 responses at most.
Lines 95 and 96 perform the Yahoo search, using a site-restricted
search for stonehenge.com
, plus whatever text the user specified.
Note that we also ask for 20 results. Why 20 results when we're only
going to use 3? Because we can't directly ask Yahoo for ``just the
magazine columns'', so we'll have to perform an additional filtering
based on the returned URL. That filtering is performed in line 97.
If we make it to line 100, we're looking at a good URL. The title and
URL are wrapped in brackets, and placed into @results
. If we have
our three results, we abort the loop in line 101.
If for any reason we can't find any results, lines 103 to 104 change the results to suggest that they inform me of their search, so I can write an article about that. (I'm always looking for new ideas!) (Please.) (I'm not kidding.)
Line 107 puts the results back to the user for a private message, or
the channel(s)
for a public message, along with a prefix if needed.
And that's the functionality of the bot. Lines 110 to 129 establish a
_default
handler for debugging purposes. Because we're registered
to receive all IRC events, we'll be getting a lot of events that we
don't care about. However, it's nice to know what they are so I can
add some hook into them if I want. The code in lines 124 to 127 dumps
out the event, including expanding any arrayrefs to their array values
for easy parsing.
Initially, the exception list in lines 113 to 121 was empty. As I ran the program repeatedly, I saw events being triggered that were either interesting, or ignorable. For interesting events, I immediately created a handler above. For ignorable events, I added the event type to the block list. And thus, my program was grown a bit at a time, helping me understand what I needed to do.
All that's left in the program is to create a bot, and run it. Lines 132 and 134 do exactly that. Because there's no trigger for getting the bot to stop, this program will run until terminated from the command line.
After I submit this column, I'll probably tinker with this bot a bit more to get it to handle nick collisions, being kicked from a channel, being invited to a channel, and so on. But for now, in a dozen dozen lines of code, I've got a working bot that helps promote one of my many public contributions to the Perl community. Hope you find what you're looking for... until next time, enjoy!
LISTING
=1= #!/usr/bin/perl =2= use strict; =3= =4= ## CONFIGURATION =5= =6= my $SERVER = "irc.perl.org"; =7= my $NICK = "search_merlyn_text"; =8= my @CHANNELS = qw(#search_merlyn_text #search_merlyn_text2); =9= =10= ## END CONFIGURATION =11= =12= use POE; =13= =14= BEGIN { =15= package MyBot; =16= =17= use POE qw(Component::IRC); =18= use base POE::Session::Attribute::; =19= =20= use Yahoo::Search AppId => 'YahooDemo'; =21= =22= sub full_to_nick { (shift =~ /(.*?)!/)[0] } =23= =24= sub _start : Package { =25= my ($heap) = @_[HEAP]; =26= my $irc = $heap->{irc} = POE::Component::IRC->spawn; =27= }; =28= =29= sub irc_registered : Package { # client is ready to connect =30= my ($sender) = @_[SENDER]; =31= my $irc = $sender->get_heap; =32= ## warn "trying to connect"; =33= $irc->yield(connect => {server => $SERVER, nick => $NICK}); =34= }; =35= =36= sub irc_255 : Package { # server is done blabbering =37= my ($sender) = @_[SENDER]; =38= my $irc = $sender->get_heap; =39= ## warn "trying to join @CHANNELS"; =40= $irc->yield(join => $_) for @CHANNELS; =41= }; =42= =43= sub irc_join : Package { # server says we joined =44= my ($sender) = @_[SENDER]; =45= my $irc = $sender->get_heap; =46= =47= my ($who, $channel) = @_[ARG0..ARG1]; =48= $who = full_to_nick($who); =49= =50= ## warn "$who joined $channel"; =51= if ($who eq $NICK) { =52= $irc->yield(privmsg => $channel => "Hello! I search merlyn's columns at http://www.stonehenge.com/merlyn/columns.html!"); =53= } =54= }; =55= =56= sub irc_public : Package { # public message in a channel =57= my ($sender) = @_[SENDER]; =58= my $irc = $sender->get_heap; =59= =60= my ($who, $channels, $message) = @_[ARG0..ARG2]; =61= $who = full_to_nick($who); =62= ## warn "$who said $message in @$channels"; =63= =64= if ($message =~ /^\Q$NICK\E(?:,|:)\s*(.*)/) { =65= my $search = $1; =66= $_[KERNEL]->yield(search => $irc, $channels, $search, "$who: "); =67= } =68= }; =69= =70= sub irc_msg : Package { # private message to me =71= my ($sender) = @_[SENDER]; =72= my $irc = $sender->get_heap; =73= =74= my ($who, $me, $message) = @_[ARG0..ARG2]; =75= $who = full_to_nick($who); =76= ## warn "$who said $message to me"; =77= $_[KERNEL]->yield(search => $irc, [$who], $message); =78= }; =79= =80= sub irc_ctcp_action : Package { # public emote =81= my ($sender) = @_[SENDER]; =82= my $irc = $sender->get_heap; =83= =84= my ($who, $channels, $message) = @_[ARG0..ARG2]; =85= $who = full_to_nick($who); =86= ## warn "$who *$message* in @$channels"; =87= }; =88= =89= sub search : Package { # time for us to search =90= my ($irc, $channels, $search, $prefix) = @_[ARG0..ARG3]; =91= ## warn "@$channels wants to see $prefix [results of $search]"; =92= =93= my @results; =94= =95= for my $result (Yahoo::Search->Results =96= (Doc => "site:stonehenge.com $search", Count => 20)) { =97= next unless $result->Url =~ /col\d\w*\.html$/; =98= =99= ## return results as mediawiki-like links =100= push @results, sprintf "[%s %s]", $result->Url, $result->Title; =101= last if @results >= 3; =102= } =103= @results = "Nothing. Please let Randal know of your great column idea!" =104= unless @results; =105= =106= $prefix ||= ""; =107= $irc->yield(privmsg => $channels => "${prefix}$search: @results"); =108= }; =109= =110= sub _default : Package { =111= our $IGNORE_THESE ||= { map { $_ => 1 } =112= qw(_child =113= irc_plugin_add irc_isupport =114= irc_snotice irc_connected =115= irc_mode irc_ping irc_part =116= irc_001 irc_002 irc_003 irc_004 irc_005 =117= irc_250 irc_251 irc_252 irc_254 irc_255 =118= irc_265 irc_266 =119= irc_353 =120= irc_366 =121= irc_372 irc_375 irc_376 =122= ) }; =123= return if $IGNORE_THESE->{$_[ARG0]}; =124= printf "%s: session %s caught an unhandled %s event.\n", =125= scalar localtime(), $_[SESSION]->ID, $_[ARG0]; =126= print "The $_[ARG0] event was given these parameters: ", =127= join(" ", map({"ARRAY" eq ref $_ ? "[@$_]" : "$_"} @{$_[ARG1]})), "\n"; =128= 0; # false for signals =129= }; =130= } =131= =132= MyBot->spawn; =133= =134= $poe_kernel->run;