Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 74 (Sep 2005)

[Suggested title: ``Babysitting an interactive CPAN update'']

A few years ago in this column [editor - Nov 2002, refer as you wish], I created a tool to provide ``mini CPAN mirror'' on my laptop, allowing me to carry just the portions of the CPAN with the latest and greatest version of each installable module. I'm happy to say that my mini-cpan mirroring program got quite a bit of attention, even being turned into a CPAN module of its own, CPAN::Mini.

I've also been a happy user, mirroring the mini-CPAN as often as hourly to my laptop. It's only a light touch on the source server, so I don't feel bad doing that. Typically, I then bounce into a CPAN.pm shell to find out what modules now needed updating, by entering its r command, which should tell me a sensible list of names.

There are two problems with the r listing. The first problem is that it's merely a listing. I have to either retype the out-of-date packages back as parameter of the install command, or cut-n-paste very carefully, making sure to add spaces between the names. Ugh. The second problem is that some modules are broken for update, meaning that although I could install version ``1.67'' just fine, version ``1.68'' refuses to work on my box, for any number of reasons. This means that after I've tried to install, it doesn't work, and it's still out of date. But then I do the same exact thing in an hour, wasting my time again.

Now, although there's a programmatic interface to all of the things that the r command and install commands are doing, I found it easier to just think of the command-line as my API. What I needed was a script on top of this API. This script could issue the r command, note its output, and create the appropriate install command, carefully omitting the recent past failures.

This kind of interactive-command babysitting is best handled by the Expect module in the CPAN. I've not used this module before, so I had to read the docs very carefully. This is ironic, because I wrote the original chat2.pl to provide a similar function for Perl version 3, and Expect was inspired by the chat2.pl package (as even mentioned in the documentation).

The basic notion of Expect is that you have a filehandle open on a process (or perhaps a socket or STDIN), and that you'll be giving that process some length of time to generate a string that matches any of one or more regular expressions that you provide. The process is a bit expensive, because we don't have streaming regular expressions yet, so what happens in practice is that as characters appear in chunks on the handle, these are added to the end of a buffer, and the entire buffer is checked against each of the regular expressions in turn.

Once the buffer matches the regular expression, everything up to the end of what matched is removed from the buffer. By default, this also exits the particular watching step, but each regular expression can also have an associated action subroutine. This subroutine can perform various actions, and/or request that the expect operation be continued.

Expect can also be used to watch multiple handles, triggering various actions like sending the output from one handle to the input of another. Using carefully constructed regular expressions, we can get ``in the middle'' between a process and the terminal, for example, intercepting various input or output streams. As a convenience, the most common of these (run this command interactively, waiting for an escape character from the terminal) is provided as a simple routine.

As I was designing this program, I remembered that some of the CPAN installs are evil, in that they require interaction from the user. So while the install command is running, any keyboard input is automatically passed to the CPAN shell directly. The program doesn't take back over until the CPAN shell prompt is once again seen. As an added feature, if the output is idle for 15 seconds, the terminal bell is rung, alerting me to my necessary task. Now I can truly just ``fire and forget'', and wait for either the shell prompt, or a series of bells.

So, let's get right to the program, given in [listing 1, below]. Lines 1 and 2 start nearly every program I write, enabling warnings and compile-time best practices.

Line 4 brings in the Expect module from the CPAN. Note that even though Expect doesn't require IO::Stty, I highly recommend installation of that module as well.

Lines 7 and 8 define two of the configuration constants. The $LOSERS file contains packages that could not be installed on the previous run of the program, and should be skipped on this run. The $BELL is how many seconds we'll wait for no output during the install phase before ringing the bell. And this repeats, so make sure you don't set it too low!

Line 9 is the regular expression for the CPAN shell prompt, defined here because I use it repeatedly throughout the program.

Line 11 sets the terminal type to dumb so that the CPAN shell doesn't get too smart, like invoking the readline interface or underlining some of the output.

Lines 14 to 16 create the CPAN shell job as an Expect object. The command to launch is given as the argument to the spawn method. Setting restart_timeout_upon_receive means that our timeouts are counted from the last output seen, not from the beginning of the expect cycle.

Line 19 similarly creates an Expect object on Perl's standard input. This object is needed for the interaction during the install phase.

Lines 21 to 30 get us to a CPAN shell prompt, using an expect call against the $cpan object. The 10 on line 22 signifies that we'll wait at most 10 seconds for any of the patterns to match before dropping out as a timeout (triggering the die in line 30).

Line 23 is an array reference around one of the possible triggers, namely the matching of the CPAN prompt. If that's a match, all of the characters up to and including that match are removed from the buffer, and expect returns the value 1 in a scalar context, indicating that the first trigger was hit (numbered starting at 1).

Lines 24 to 29 define another trigger. If the CPAN shell was terminated abruptly (like I accidentally closed the window in which the CPAN shell was running, which happens too frequently), the CPAN will notice that there's a lockfile from another job, but the process is no longer running, and ask me if I want to remove the lockfile. The text of the regular expression in line 24 matches this case. The second parameter is a coderef which will be called, passing in the $cpan object as the first parameter as if it was a method call.

Inside the subroutine, I first clear out any remaining buffer items on the match (normally, only the match and before-match parts are cleared) (line 26), and then send a y to answer the prompt (line 27). Because the child process is operating in cooked mode, I have to send a return because I hit the return key on my keyboard to answer this, not the linefeed key.

Finally, the subroutine exits with the constant exp_continue, which conveniently returns the string exp_continue. This is a special return value that tells the expect method to restart, rather than exit (in this case, with the number 2 as the second possible match). So we'll start looking for the CPAN prompt again.

Once we get the CPAN prompt, we'll ensure that the index is up to date by sending the reload index to the process (line 33), and we'll wait up to 20 seconds for no more output (causing an abort), or the CPAN prompt, whichever comes first (line 34).

Line 37 fetches the out of date packages by calling the subroutine defined in lines 103 to 108, so let's look there for a second.

Line 104 sends our now often-referenced r command. Line 105 waits for the banner at the top of the r report. This has the side-effect of flushing all output up to and including the banner, important for the next two steps.

Line 106 waits for the CPAN prompt. Line 107 extracts all the text before the CPAN prompt using the before method, then splits that into lines, then looks for package names at the beginning of each line. The result is a list of all packages that are out of date, which is returned from the subroutine in a list context. (In a scalar context, map returns the count of items, not very useful here.)

So, back to line 37, we have the list of modules that need updating. Now we have to subtract out the ones that didn't work so well on the previous try. So, lines 40 to 43 fetch those, and line 45 turns them into a hash for easy filtering. Line 46 rips the losers out of the currently out of date packages, so we can see what we'll really try to do.

Lines 49 to 52 let me know that some of the outdated modules are going to be skipped, pointing me at a file I can edit if I want to retry them anyway.

If there are things to do, the big if starting in line 55 does them.

First, lines 58 and 59 ensure that we're in ``follow'' mode, so that dependencies won't ask questions. (I normally leave my CPAN shell configured in ``ask'' mode so that it doesn't go off into the deep without me getting a chance to say no.)

Then, line 62 does the deed, asking the CPAN shell to install all of the out-of-date modules.

Lines 65 to 83 set up the installation phase. First, in line 65, we put the terminal into raw mode, so that characters are available to this program on a character-by-character basis. Echo is also turned off to prevent double echoing (the terminal running the CPAN shell is also echoing anyway).

Line 67 defines the timeout as equal to the $BELL length defined above. Lines 69 to 73 define the timeout handler, using the special timeout string as a pattern. On a timeout, we print control-G to the terminal, and then continue the expect loop.

Line 74 says that if we see a CPAN prompt, we're done. This will also cause the expect method to return 2, although we're not testing that, because there's really no unexpected way out of this expect loop.

Line 76 brings in ``other'' Expect objects to watch. The -i parameter can be followed by either a single Expect object (here $stdin), or an arrayref of Expect objects. The patterns below this entry apply to this object (list) instead of the original object. Multiple -i options can be included, allowing expect to watch many different Expect objects with many different sets of patterns.

For the $stdin Expect object, watching our program's STDIN, we're looking for only one pattern: any non-empty string (as given in line 77). If this is seen, the match method returns the string (line 79), which we then send to the CPAN process immediately. Again, we return the exp_continue special value to indicate that the loop should not exit (line 80).

Once we're done with the install phase, we need to see if we made any headway. Line 86 invokes our r command again, and if anything is still there, reports the problem (lines 87 to 90). Lines 92 to 94 update the losers file with these packages, possibly emptying the file out if everything is now current.

Whether we had anything to install or not, lines 98 and 99 now shut down the CPAN shell process cleanly.

And that's all there is to it. The program captures the series of steps that I was performing manually, reducing it to simple program invocation. Expect can be used for some very cool things, and there are many examples on the net to be found. Also, look for the TCL-based expect examples as well, as the syntax is very similar, although you'll have to understand both TCL and Perl to complete the translation. Until next time, expect to enjoy!

Listing

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     
        =4=     use Expect;
        =5=     
        =6=     ## configuration and constants
        =7=     my $LOSERS = (glob "~/.cpan-r-losers")[0];
        =8=     my $BELL = 15;                  # timeout seconds to send bell to user
        =9=     my $CPAN = qr/cpan> \z/;        # cpan shell prompt
        =10=    
        =11=    $ENV{TERM} = "dumb";            # keep CPAN.pm from being clever
        =12=    
        =13=    ## set up Expect objects
        =14=    my $cpan = Expect->new;
        =15=    $cpan->restart_timeout_upon_receive(1);
        =16=    $cpan->spawn('perl -MCPAN -eshell');
        =17=    
        =18=    my $stdin = Expect->init(\*STDIN);
        =19=    
        =20=    ## get to a CPAN shell prompt
        =21=    $cpan->expect
        =22=      (10,
        =23=       [$CPAN],
        =24=       [qr/another CPAN process.*not responding/s => sub {
        =25=          my $self = shift;
        =26=          $self->clear_accum;
        =27=          $self->send("y\r");
        =28=          exp_continue;             # look for cpan> prompt now
        =29=        }],
        =30=      ) or die "didn't get cpan prompt";
        =31=    
        =32=    ## make sure index is up to date
        =33=    $cpan->send("reload index\r");
        =34=    $cpan->expect(20, [$CPAN]) or die "missing prompt after reloading index";
        =35=    
        =36=    ## find out what's old
        =37=    my @packages = out_of_date_packages();
        =38=    
        =39=    ## get previous losers, and subtract them from the out-of-date list
        =40=    open LOSERS, "+<$LOSERS"
        =41=      or open LOSERS, ">$LOSERS"
        =42=      or die "Cannot create $LOSERS: $!";
        =43=    my @losers = split /\s+/, join "", <LOSERS>;
        =44=    
        =45=    my %losers = map { $_ => 1 } @losers;
        =46=    my @to_do_packages = grep !$losers{$_}, @packages;
        =47=    
        =48=    ## notify that we're not doing all of the out of date
        =49=    if (@packages and @losers) {
        =50=      print "\n### according to $LOSERS, we are skipping:\n",
        =51=        map "###  $_\n", @losers;
        =52=    }
        =53=    
        =54=    ## do we have anything to do?
        =55=    if (@to_do_packages) {
        =56=    
        =57=      ## incorporate dependencies automatically
        =58=      $cpan->send("o conf prerequisites_policy follow\r");
        =59=      $cpan->expect(5, [$CPAN]) or die "missing prompt after setting conf";
        =60=    
        =61=      ## and do the work!
        =62=      $cpan->send("install @to_do_packages\r");
        =63=    
        =64=      ## babysit the result, allow the user to interact if needed
        =65=      $stdin->stty(qw(raw -echo));
        =66=      $cpan->expect
        =67=        ($BELL,
        =68=         ## cpan expecting...
        =69=         [timeout => sub {
        =70=            my $self = shift;
        =71=            print "\cG";            # wake up, wake up, to a happy day!
        =72=            exp_continue;           # keep going
        =73=          }],
        =74=         [$CPAN],                   # exit if we see cpan prompt
        =75=         ## stdin expecting...
        =76=         -i => $stdin,
        =77=         [qr/.+/s => sub {
        =78=            my $self = shift;
        =79=            $cpan->send($self->match);
        =80=            exp_continue;           # and keep going
        =81=          }],
        =82=        );
        =83=      $stdin->stty(qw(sane));
        =84=    
        =85=      ## Oops.  Didn't get everything to work (it happens!)
        =86=      my @still_out_of_date = out_of_date_packages();
        =87=      if (@still_out_of_date) {
        =88=        print "\n### still out of date (saving to $LOSERS):\n",
        =89=          map "###  $_\n", @still_out_of_date;
        =90=      }
        =91=      ## record the new losers list so we won't try that next time
        =92=      seek LOSERS, 0, 0;
        =93=      truncate LOSERS, 0;
        =94=      print LOSERS map "$_\n", @still_out_of_date;
        =95=    }
        =96=    
        =97=    ## bye bye
        =98=    $cpan->send("exit\r");
        =99=    $cpan->soft_close;
        =100=   
        =101=   ## return a list of out of date packages using CPAN's "r" command
        =102=   ## presumes $cpan Expect object is at the CPAN prompt
        =103=   sub out_of_date_packages {
        =104=     $cpan->send("r\r");
        =105=     $cpan->expect(60, [qr/Package namespace.*\n/]) or die "missing banner";
        =106=     $cpan->expect(60, [$CPAN]) or die "missing CPAN prompt after 'r' output";
        =107=     map /^([\w:]+)\s+\d/, split /\r?\n/, $cpan->before;
        =108=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.