Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 28 (Aug 1998)

Last month's column showed my first use of the Parallel User Agent -- a way of sending out multiple requests in the same process to be satisfied concurrently. This time, let's look at another use of the same package to stress-test a CGI script. If you haven't installed the Parallel User Agent, you'll find it in the CPAN. You'll also need to install the LWP package, also found in the CPAN. (The CPAN can be found at http://cpan.perl.org/, amongst other places.)

As a practical test target, I decided to beat up on my Eliza bot script from two columns ago. In fact, this was a good practical test to ensure that multiple users would have the right information, and work correctly no matter how fast the requests come in. I was hitting between 2 and 4 requests per second with this test harness on a moderate speed SPARCstation, much faster than I could hit the submit button on my browser.

The program that connects to the Eliza script repeatedly is presented in [listing one, below].

Line 1 begins most of the Perl programs I write, giving the path to my Perl (in my home bin directory) and turning on warnings. Line 2 enables the compiler restictions that are useful for all programs over ten lines long.

Line 4 disables the normal buffering of STDOUT. This allows me to see the output from the print operator without having to wait for input, or for the output buffer to be filled.

Line 6 locates the StressAgent module definition. Because I'm not installing this in the standard library places, I had to figure out how to tell Perl to locate this module. What I decided to do was put it alongside the source file for the program. The FindBin module determines the program location by looking at $0 and $ENV{PATH}, storing the directory in $Bin. I then use this directory in a use lib pragma to add it to the list of directories searched for other modules. Finally, I bring in the StressAgent definition. I've placed all three of these steps on the same program line because they really all relate to each other.

Line 7 pulls in the HTTP::Request::Common module, used below to create the GET and POST requests. This module is found in the LWP library.

Line 9 is commented out. For debugging, I uncomment this line, and I'll start seeing all the transactions that LWP generates, including the headers of all requests and responses.

Lines 11 through 15 create the @YOW array. This array is full of silly sayings from Zippy the Pinhead that I'm stealing from the GNU Emacs distribution. The variable gets created as a result of executing the do block. Within this block, a temporary value for @ARGV and $/ get set, and the array is then initialized from splitting by NUL bytes (followed by optional whitespace) the result of reading the file. The path here is to my installation of the GNU Emacs files. Mostly likely, you'll need to come up with some similar file, most likely in a different location. The first entry is a header for the file, stripped off in line 15.

Line 16 creates an empty hash called %ASKED to record the Zippy line that we've fed a particular doctor. Doctors are identified by the submission URL, so that'll be the key in this hash. The corresponding value will be the question.

Line 18 does the dirty work of creating a new StressAgent and launching it to fire off five requests in parallel continuously. The new class method takes a single parameter of a CODEREF to be called to create the initial requests, as well as following up each response with an additional request (to keep 5 requests active at all times). In our case, it's the &patient subroutine, defined below. The return value from new is a StressAgent instance, to which we are immediately sending the do_it method, here asking for 5 requests.

The &patient subroutine, defined in lines 20 through 37, really give the stress tester its smarts. The StressAgent module will call this subroutine, initially with no arguments when first starting up, and later with 3 arguments consisting of the original request, the response we got, and a LWP::ParallelAgent::entry object, which we are not using here. Line 21 stores these three parameters into named lexical local variables.

Line 22 notes the case where we're called with zero parameters. This occurs when the StressAgent is initially loading up the first round of requests. In this case, we simply want to set up a GET request to fetch the right address. The return value from this routine should be either a valid HTTP::Request object, or undef to indicate that nothing further should be done with this thread.

Line 23 sees if the now-valid response was a 200-type response, meaning that we got a nice response. It's up to this subroutine now to parse the content of that response and see what to do next. For our example, we want to extract what the doctor said, print it, and then hand the doctor a new problem.

Line 24 was for debugging. By uncommenting it, the first round of responses merely gets dumped, and an undef return means that nothing further will be done with those responses. By looking at the response, I was able to write the rest of the routine picking out the parts I needed to generate a further request.

Line 25 extracts the content of the response as a list of lines. I noticed the interesting parts of the response were always on lines 3 and 4, so by breaking it into lines, I was able to find things easier. Line 26 extracts the response URL, so that I know where to send the next Zippy line. Line 27 extracts the doctor's response, so I can see that things worked.

Lines 28 through 30 dump out the previously saved Zippy line (if any), along with the doctor's response. The stuff is indented using a map operator on each line. Note that the previously saved line is saved in %ASKED, indexed by the doctor's unique URL.

Lines 31 and 32 select a new bizarre statement for the doctor to try to figure out, and save it into the memory so we can pull it out when the doctor responds. The message is selected randomly from @YOW. Line 33 constructs a new HTTP::Request as a POST request to the indicated form-submission URL, with a single field filled in with the Zippy quote. Again, the name of the field was determined by looking at the debug output I generated in line 24.

If the response was not a nice 200 response, then we'll make it down to line 35, and dump out the entire response. If you have a CGI script that can sometimes generate things other than 200 responses, you'll certainly be able to handle them by looking for the right codes before reaching this line. Returning undef in line 36 means that we won't process any further request here.

That does it for the main program. The StressAgent module is presented in [listing two, below].

Line 1 turns on the compiler restrictions. Line 3 establishes the package for this module. (For the use directive to work, it should be the same as the filename.) Line 4 brings in the Parallel UserAgent module, and lines 5 and 6 cause objects created by this module to also inherit from that module.

Lines 8 through 18 create the object constructor to make a new StressAgent object. The package name comes in first, shifted off in line 9. Initially, we call the superclass's new routine (in line 10), and then save the only parameter to this routine (a coderef) in line 11. Lines 12 and 13 create two instance variables for this object, storing them away in an instance variable named for this package. Lines 14 through 16 call instance methods to initialize some of the other parameters: setting a nice agent type, setting up any needed proxy parameters, and also setting the maximum number of parallel requests. Finally, line 17 returns the newly created StressAgent object.

Lines 20 through 24 define an internal instance method that calls the callback function (like &patient in our example) with either no parameters or three parameters. If the callback function returns undef, nothing further happens, but if it instead returns (hopefully) an HTTP::Request object, then we put it into the queue as another request to be executing in parallel.

Lines 26 through 35 handle each response as they come in. The Parallel UserAgent will gather each request's response into a correct response object, and when it's complete will call this method. The parameters come in and get separated in line 28. For logging purposes, we dump out the time of day, the response code, response message, and an incrementing counter for each response. This gives me a quick way to compute the number of transactions handled over a given period of time by scrolling through the logfile. Each transaction handled increments the instance variable did, which gets printed as the number of transactions we did.

Line 32 calls the callback function, passing it the three parameters of this completed response. Line 33 discards the memory of this transaction as far as Parallel UserAgent is concerned. This means that we aren't wasting memory trying to hang on to old data. Line 34 returns the object, mostly as a matter of convention because nothing else makes sense.

Lines 37 through 45 define the top-level event loop. The do_it method is called with a number of parallel threads to maintain, and a number of seconds to wait for a response. These default to 10 and 15, respectively. The initial threads are created by calling the callback routine with no parameters. Line 44 is basically an infinite loop, because the Parallel UserAgent will cause all requests to be run to completion; however, each request triggers a new request!

Line 47 ends this module file, so that the implicit require inside a use directive will see a true value during the operation.

So, there's the program and the module. The module should be placed in a file named StressAgent.pm in the same directory as the script, unless you alter the use lib strategy above. Obviously, stress testing my web server's Eliza script won't be of much use to you, so you should start by figuring out what you want to do when each response comes back, and coding the response callback function appropriately.

This should give you a model for how to test your CGI scripts. You should use test harnesses like this with all your scripts, especially those that use shared resources. Eventually, the functionality of Parallel::UserAgent will be folded into the normal LWP library, making it easier to write code like this. Until then, at least we have a way of doing it. Enjoy!

Listings

        =0=     ##### listing one #####
        =1=     #!/home/merlyn/bin/perl -w
        =2=     use strict;
        =3=     
        =4=     $|++;
        =5=     
        =6=     use FindBin qw($Bin); use lib $Bin; use StressAgent;
        =7=     use HTTP::Request::Common;
        =8=     
        =9=     ## use LWP::Debug qw(+);                # DEBUG
        =10=    
        =11=    my @YOW = do {
        =12=      local (@ARGV, $/) = "/usr/local/share/emacs/20.2/etc/yow.lines";
        =13=      split /\0\s*/, <>;
        =14=      };
        =15=    shift @YOW;                     # first one is a comment
        =16=    my %ASKED = ();
        =17=    
        =18=    StressAgent->new(\&patient)->do_it(5);
        =19=    
        =20=    sub patient {
        =21=      my ($request,$response,$entry) = @_;
        =22=      return GET "http://www.stonehenge.com/cgi/eliza2"; unless $request;
        =23=      if ($response->code == 200) {
        =24=        ## print $response->as_string; return; ## debug
        =25=        my @content = $response->content =~ /(.*)\n?/g;
        =26=        my ($url) = $content[2] =~ /action="(.*?)"/i;
        =27=        my ($prompt) = $content[3] =~ /<p>(.*?)<\/p>/i;
        =28=        for ($ASKED{$url} || "[initial question]", " -> ", $prompt) {
        =29=          print map "  $_\n", split /\n/;
        =30=        }
        =31=        my $yow = $YOW[rand @YOW];
        =32=        $ASKED{$url} = $yow;
        =33=        return POST $url, [message => $yow];
        =34=      }
        =35=      print $response->as_string;
        =36=      return;
        =37=    }
        =0=     ##### listing two #####
        =1=     use strict;
        =2=     
        =3=     package StressAgent;
        =4=     use LWP::Parallel::UserAgent;
        =5=     use vars qw(@ISA);
        =6=     @ISA = qw(LWP::Parallel::UserAgent);
        =7=     
        =8=     sub new {
        =9=       my $package = shift;
        =10=      my $self = $package->SUPER::new();
        =11=      my $make_request = shift;
        =12=      $self->{+__PACKAGE__}{make_request} = $make_request;
        =13=      $self->{+__PACKAGE__}{did} = 0;
        =14=      $self->agent(+__PACKAGE__ . " (".$self->agent.")");
        =15=      $self->env_proxy;
        =16=      $self->max_req(1000);         # never reached
        =17=      $self;
        =18=    }
        =19=    
        =20=    sub maybe_register {
        =21=      my $self = shift;
        =22=      my $new_request = $self->{+__PACKAGE__}{make_request}->(@_);
        =23=      $self->register($new_request) if defined $new_request;
        =24=    }
        =25=    
        =26=    sub on_return {
        =27=      my $self = shift;
        =28=      my ($request, $response, $entry) = @_;
        =29=      print
        =30=        localtime().": ", $response->code, ", ", $response->message,
        =31=        " {",++$self->{+__PACKAGE__}{did}, "}\n";
        =32=      $self->maybe_register(@_);
        =33=      $self->discard_entry($entry);
        =34=      $self;
        =35=    }
        =36=    
        =37=    sub do_it {
        =38=      my $self = shift;
        =39=      my $max = shift || 10;
        =40=      my $wait = shift || 15;
        =41=      for (1..$max) {
        =42=        $self->maybe_register();
        =43=      }
        =44=      $self->wait($wait);
        =45=    }
        =46=    
        =47=    1;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.