Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Download this listing! | ||
Linux Magazine Column 74 (Sep 2005)
[Suggested title: ``Babysitting an interactive CPAN update'']
A few years ago in this column [editor - Nov 2002, refer as you wish],
I created a tool to provide ``mini CPAN mirror'' on my laptop, allowing
me to carry just the portions of the CPAN with the latest and greatest
version of each installable module. I'm happy to say that my mini-cpan
mirroring program got quite a bit of attention, even being turned into
a CPAN module of its own, CPAN::Mini
.
I've also been a happy user, mirroring the mini-CPAN as often as
hourly to my laptop. It's only a light touch on the source server, so
I don't feel bad doing that. Typically, I then bounce into a
CPAN.pm
shell to find out what modules now needed updating, by
entering its r
command, which should tell me a sensible list of
names.
There are two problems with the r
listing. The first problem is
that it's merely a listing. I have to either retype the out-of-date
packages back as parameter of the install
command, or cut-n-paste
very carefully, making sure to add spaces between the names. Ugh.
The second problem is that some modules are broken for update, meaning
that although I could install version ``1.67'' just fine, version ``1.68''
refuses to work on my box, for any number of reasons. This means that
after I've tried to install, it doesn't work, and it's still out of
date. But then I do the same exact thing in an hour, wasting my time
again.
Now, although there's a programmatic interface to all of the things
that the r
command and install
commands are doing, I found it
easier to just think of the command-line as my API. What I needed was
a script on top of this API. This script could issue the r
command, note its output, and create the appropriate install
command, carefully omitting the recent past failures.
This kind of interactive-command babysitting is best handled by the
Expect
module in the CPAN. I've not used this module before, so I
had to read the docs very carefully. This is ironic, because I wrote
the original chat2.pl
to provide a similar function for Perl
version 3, and Expect
was inspired by the chat2.pl
package (as
even mentioned in the documentation).
The basic notion of Expect
is that you have a filehandle open on a
process (or perhaps a socket or STDIN
), and that you'll be giving
that process some length of time to generate a string that matches any
of one or more regular expressions that you provide. The process is a
bit expensive, because we don't have streaming regular expressions
yet, so what happens in practice is that as characters appear in
chunks on the handle, these are added to the end of a buffer, and the
entire buffer is checked against each of the regular expressions in
turn.
Once the buffer matches the regular expression, everything up to the end of what matched is removed from the buffer. By default, this also exits the particular watching step, but each regular expression can also have an associated action subroutine. This subroutine can perform various actions, and/or request that the expect operation be continued.
Expect
can also be used to watch multiple handles, triggering
various actions like sending the output from one handle to the input
of another. Using carefully constructed regular expressions, we can
get ``in the middle'' between a process and the terminal, for example,
intercepting various input or output streams. As a convenience, the
most common of these (run this command interactively, waiting for an
escape character from the terminal) is provided as a simple routine.
As I was designing this program, I remembered that some of the CPAN
installs are evil, in that they require interaction from the user. So
while the install
command is running, any keyboard input is
automatically passed to the CPAN shell directly. The program doesn't
take back over until the CPAN shell prompt is once again seen. As an
added feature, if the output is idle for 15 seconds, the terminal bell
is rung, alerting me to my necessary task. Now I can truly just ``fire
and forget'', and wait for either the shell prompt, or a series of
bells.
So, let's get right to the program, given in [listing 1, below]. Lines 1 and 2 start nearly every program I write, enabling warnings and compile-time best practices.
Line 4 brings in the Expect
module from the CPAN. Note that even
though Expect
doesn't require IO::Stty
, I highly recommend
installation of that module as well.
Lines 7 and 8 define two of the configuration constants. The
$LOSERS
file contains packages that could not be installed on the
previous run of the program, and should be skipped on this run. The
$BELL
is how many seconds we'll wait for no output during the
install
phase before ringing the bell. And this repeats, so make
sure you don't set it too low!
Line 9 is the regular expression for the CPAN shell prompt, defined here because I use it repeatedly throughout the program.
Line 11 sets the terminal type to dumb
so that the CPAN shell
doesn't get too smart, like invoking the readline interface or
underlining some of the output.
Lines 14 to 16 create the CPAN shell job as an Expect object. The
command to launch is given as the argument to the spawn
method.
Setting restart_timeout_upon_receive
means that our timeouts are
counted from the last output seen, not from the beginning of the
expect cycle.
Line 19 similarly creates an Expect object on Perl's standard input. This object is needed for the interaction during the install phase.
Lines 21 to 30 get us to a CPAN shell prompt, using an expect
call
against the $cpan
object. The 10
on line 22 signifies that
we'll wait at most 10 seconds for any of the patterns to match before
dropping out as a timeout (triggering the die
in line 30).
Line 23 is an array reference around one of the possible triggers,
namely the matching of the CPAN prompt. If that's a match, all of the
characters up to and including that match are removed from the buffer,
and expect
returns the value 1
in a scalar context, indicating
that the first trigger was hit (numbered starting at 1).
Lines 24 to 29 define another trigger. If the CPAN shell was
terminated abruptly (like I accidentally closed the window in which
the CPAN shell was running, which happens too frequently), the CPAN
will notice that there's a lockfile from another job, but the process
is no longer running, and ask me if I want to remove the lockfile.
The text of the regular expression in line 24 matches this case. The
second parameter is a coderef which will be called, passing in the
$cpan
object as the first parameter as if it was a method call.
Inside the subroutine, I first clear out any remaining buffer items on
the match (normally, only the match and before-match parts are
cleared) (line 26), and then send a y
to answer the prompt (line 27).
Because the child process is operating in cooked mode, I have to
send a return because I hit the return key on my keyboard to answer
this, not the linefeed key.
Finally, the subroutine exits with the constant exp_continue
, which
conveniently returns the string exp_continue
. This is a special
return value that tells the expect
method to restart, rather than
exit (in this case, with the number 2 as the second possible match).
So we'll start looking for the CPAN prompt again.
Once we get the CPAN prompt, we'll ensure that the index is up to date
by sending the reload index
to the process (line 33), and we'll
wait up to 20 seconds for no more output (causing an abort), or the
CPAN prompt, whichever comes first (line 34).
Line 37 fetches the out of date packages by calling the subroutine defined in lines 103 to 108, so let's look there for a second.
Line 104 sends our now often-referenced r
command. Line 105 waits
for the banner at the top of the r
report. This has the
side-effect of flushing all output up to and including the banner,
important for the next two steps.
Line 106 waits for the CPAN prompt. Line 107 extracts all the text
before the CPAN prompt using the before
method, then splits that
into lines, then looks for package names at the beginning of each
line. The result is a list of all packages that are out of date,
which is returned from the subroutine in a list context. (In a scalar
context, map
returns the count of items, not very useful here.)
So, back to line 37, we have the list of modules that need updating. Now we have to subtract out the ones that didn't work so well on the previous try. So, lines 40 to 43 fetch those, and line 45 turns them into a hash for easy filtering. Line 46 rips the losers out of the currently out of date packages, so we can see what we'll really try to do.
Lines 49 to 52 let me know that some of the outdated modules are going to be skipped, pointing me at a file I can edit if I want to retry them anyway.
If there are things to do, the big if
starting in line 55 does
them.
First, lines 58 and 59 ensure that we're in ``follow'' mode, so that dependencies won't ask questions. (I normally leave my CPAN shell configured in ``ask'' mode so that it doesn't go off into the deep without me getting a chance to say no.)
Then, line 62 does the deed, asking the CPAN shell to install all of the out-of-date modules.
Lines 65 to 83 set up the installation phase. First, in line 65, we put the terminal into raw mode, so that characters are available to this program on a character-by-character basis. Echo is also turned off to prevent double echoing (the terminal running the CPAN shell is also echoing anyway).
Line 67 defines the timeout as equal to the $BELL
length defined
above. Lines 69 to 73 define the timeout handler, using the special
timeout
string as a pattern. On a timeout, we print control-G to
the terminal, and then continue the expect loop.
Line 74 says that if we see a CPAN prompt, we're done. This will also
cause the expect
method to return 2, although we're not testing
that, because there's really no unexpected way out of this expect
loop.
Line 76 brings in ``other'' Expect objects to watch. The -i
parameter can be followed by either a single Expect object (here
$stdin
), or an arrayref of Expect objects. The patterns below this
entry apply to this object (list) instead of the original object.
Multiple -i
options can be included, allowing expect
to watch
many different Expect objects with many different sets of patterns.
For the $stdin
Expect object, watching our program's STDIN
,
we're looking for only one pattern: any non-empty string (as given in
line 77). If this is seen, the match
method returns the string
(line 79), which we then send to the CPAN process immediately. Again,
we return the exp_continue
special value to indicate that the loop
should not exit (line 80).
Once we're done with the install phase, we need to see if we made any
headway. Line 86 invokes our r
command again, and if anything is
still there, reports the problem (lines 87 to 90). Lines 92 to 94
update the losers file with these packages, possibly emptying the file
out if everything is now current.
Whether we had anything to install or not, lines 98 and 99 now shut down the CPAN shell process cleanly.
And that's all there is to it. The program captures the series of
steps that I was performing manually, reducing it to simple program
invocation. Expect
can be used for some very cool things, and
there are many examples on the net to be found. Also, look for the
TCL-based expect
examples as well, as the syntax is very similar,
although you'll have to understand both TCL and Perl to complete the
translation. Until next time, expect to enjoy!
Listing
=1= #!/usr/bin/perl -w =2= use strict; =3= =4= use Expect; =5= =6= ## configuration and constants =7= my $LOSERS = (glob "~/.cpan-r-losers")[0]; =8= my $BELL = 15; # timeout seconds to send bell to user =9= my $CPAN = qr/cpan> \z/; # cpan shell prompt =10= =11= $ENV{TERM} = "dumb"; # keep CPAN.pm from being clever =12= =13= ## set up Expect objects =14= my $cpan = Expect->new; =15= $cpan->restart_timeout_upon_receive(1); =16= $cpan->spawn('perl -MCPAN -eshell'); =17= =18= my $stdin = Expect->init(\*STDIN); =19= =20= ## get to a CPAN shell prompt =21= $cpan->expect =22= (10, =23= [$CPAN], =24= [qr/another CPAN process.*not responding/s => sub { =25= my $self = shift; =26= $self->clear_accum; =27= $self->send("y\r"); =28= exp_continue; # look for cpan> prompt now =29= }], =30= ) or die "didn't get cpan prompt"; =31= =32= ## make sure index is up to date =33= $cpan->send("reload index\r"); =34= $cpan->expect(20, [$CPAN]) or die "missing prompt after reloading index"; =35= =36= ## find out what's old =37= my @packages = out_of_date_packages(); =38= =39= ## get previous losers, and subtract them from the out-of-date list =40= open LOSERS, "+<$LOSERS" =41= or open LOSERS, ">$LOSERS" =42= or die "Cannot create $LOSERS: $!"; =43= my @losers = split /\s+/, join "", <LOSERS>; =44= =45= my %losers = map { $_ => 1 } @losers; =46= my @to_do_packages = grep !$losers{$_}, @packages; =47= =48= ## notify that we're not doing all of the out of date =49= if (@packages and @losers) { =50= print "\n### according to $LOSERS, we are skipping:\n", =51= map "### $_\n", @losers; =52= } =53= =54= ## do we have anything to do? =55= if (@to_do_packages) { =56= =57= ## incorporate dependencies automatically =58= $cpan->send("o conf prerequisites_policy follow\r"); =59= $cpan->expect(5, [$CPAN]) or die "missing prompt after setting conf"; =60= =61= ## and do the work! =62= $cpan->send("install @to_do_packages\r"); =63= =64= ## babysit the result, allow the user to interact if needed =65= $stdin->stty(qw(raw -echo)); =66= $cpan->expect =67= ($BELL, =68= ## cpan expecting... =69= [timeout => sub { =70= my $self = shift; =71= print "\cG"; # wake up, wake up, to a happy day! =72= exp_continue; # keep going =73= }], =74= [$CPAN], # exit if we see cpan prompt =75= ## stdin expecting... =76= -i => $stdin, =77= [qr/.+/s => sub { =78= my $self = shift; =79= $cpan->send($self->match); =80= exp_continue; # and keep going =81= }], =82= ); =83= $stdin->stty(qw(sane)); =84= =85= ## Oops. Didn't get everything to work (it happens!) =86= my @still_out_of_date = out_of_date_packages(); =87= if (@still_out_of_date) { =88= print "\n### still out of date (saving to $LOSERS):\n", =89= map "### $_\n", @still_out_of_date; =90= } =91= ## record the new losers list so we won't try that next time =92= seek LOSERS, 0, 0; =93= truncate LOSERS, 0; =94= print LOSERS map "$_\n", @still_out_of_date; =95= } =96= =97= ## bye bye =98= $cpan->send("exit\r"); =99= $cpan->soft_close; =100= =101= ## return a list of out of date packages using CPAN's "r" command =102= ## presumes $cpan Expect object is at the CPAN prompt =103= sub out_of_date_packages { =104= $cpan->send("r\r"); =105= $cpan->expect(60, [qr/Package namespace.*\n/]) or die "missing banner"; =106= $cpan->expect(60, [$CPAN]) or die "missing CPAN prompt after 'r' output"; =107= map /^([\w:]+)\s+\d/, split /\r?\n/, $cpan->before; =108= }