Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Download this listing! | ||
Web Techniques Column 28 (Aug 1998)
Last month's column showed my first use of the Parallel User Agent
-- a way of sending out multiple requests in the same process to be
satisfied concurrently. This time, let's look at another use of the
same package to stress-test a CGI script. If you haven't installed
the Parallel User Agent, you'll find it in the CPAN. You'll also need
to install the LWP package, also found in the CPAN. (The CPAN can be
found at http://cpan.perl.org/
, amongst other places.)
As a practical test target, I decided to beat up on my Eliza bot script from two columns ago. In fact, this was a good practical test to ensure that multiple users would have the right information, and work correctly no matter how fast the requests come in. I was hitting between 2 and 4 requests per second with this test harness on a moderate speed SPARCstation, much faster than I could hit the submit button on my browser.
The program that connects to the Eliza script repeatedly is presented in [listing one, below].
Line 1 begins most of the Perl programs I write, giving the path to my Perl (in my home bin directory) and turning on warnings. Line 2 enables the compiler restictions that are useful for all programs over ten lines long.
Line 4 disables the normal buffering of STDOUT. This allows me to see the output from the print operator without having to wait for input, or for the output buffer to be filled.
Line 6 locates the StressAgent
module definition. Because I'm not
installing this in the standard library places, I had to figure out
how to tell Perl to locate this module. What I decided to do was put
it alongside the source file for the program. The FindBin
module
determines the program location by looking at $0 and $ENV{PATH},
storing the directory in $Bin. I then use this directory in a use
lib
pragma to add it to the list of directories searched for other
modules. Finally, I bring in the StressAgent
definition. I've
placed all three of these steps on the same program line because they
really all relate to each other.
Line 7 pulls in the HTTP::Request::Common
module, used below to
create the GET and POST requests. This module is found in the LWP
library.
Line 9 is commented out. For debugging, I uncomment this line, and I'll start seeing all the transactions that LWP generates, including the headers of all requests and responses.
Lines 11 through 15 create the @YOW array. This array is full of silly
sayings from Zippy the Pinhead that I'm stealing from the GNU Emacs
distribution. The variable gets created as a result of executing the
do
block. Within this block, a temporary value for @ARGV and $/ get
set, and the array is then initialized from splitting by NUL
bytes
(followed by optional whitespace) the result of reading the file. The
path here is to my installation of the GNU Emacs files. Mostly likely,
you'll need to come up with some similar file, most likely in a
different location. The first entry is a header for the file, stripped
off in line 15.
Line 16 creates an empty hash called %ASKED to record the Zippy line that we've fed a particular doctor. Doctors are identified by the submission URL, so that'll be the key in this hash. The corresponding value will be the question.
Line 18 does the dirty work of creating a new StressAgent
and
launching it to fire off five requests in parallel continuously. The
new
class method takes a single parameter of a CODEREF
to be
called to create the initial requests, as well as following up each
response with an additional request (to keep 5 requests active at all
times). In our case, it's the &patient
subroutine, defined
below. The return value from new
is a StressAgent instance, to
which we are immediately sending the do_it
method, here asking for
5 requests.
The &patient
subroutine, defined in lines 20 through 37, really
give the stress tester its smarts. The StressAgent module will call
this subroutine, initially with no arguments when first starting up,
and later with 3 arguments consisting of the original request, the
response we got, and a LWP::ParallelAgent::entry
object, which we
are not using here. Line 21 stores these three parameters into named
lexical local variables.
Line 22 notes the case where we're called with zero parameters. This
occurs when the StressAgent is initially loading up the first round of
requests. In this case, we simply want to set up a GET request to
fetch the right address. The return value from this routine should be
either a valid HTTP::Request
object, or undef
to indicate that
nothing further should be done with this thread.
Line 23 sees if the now-valid response was a 200
-type response,
meaning that we got a nice response. It's up to this subroutine now to
parse the content of that response and see what to do next. For our
example, we want to extract what the doctor said, print it, and then
hand the doctor a new problem.
Line 24 was for debugging. By uncommenting it, the first round of
responses merely gets dumped, and an undef
return means that
nothing further will be done with those responses. By looking at the
response, I was able to write the rest of the routine picking out the
parts I needed to generate a further request.
Line 25 extracts the content of the response as a list of lines. I noticed the interesting parts of the response were always on lines 3 and 4, so by breaking it into lines, I was able to find things easier. Line 26 extracts the response URL, so that I know where to send the next Zippy line. Line 27 extracts the doctor's response, so I can see that things worked.
Lines 28 through 30 dump out the previously saved Zippy line (if any),
along with the doctor's response. The stuff is indented using a map
operator on each line. Note that the previously saved line is saved in
%ASKED, indexed by the doctor's unique URL.
Lines 31 and 32 select a new bizarre statement for the doctor to try
to figure out, and save it into the memory so we can pull it out when
the doctor responds. The message is selected randomly from @YOW. Line
33 constructs a new HTTP::Request
as a POST request to the
indicated form-submission URL, with a single field filled in with the
Zippy quote. Again, the name of the field was determined by looking at
the debug output I generated in line 24.
If the response was not a nice 200 response, then we'll make it down
to line 35, and dump out the entire response. If you have a CGI script
that can sometimes generate things other than 200 responses, you'll
certainly be able to handle them by looking for the right codes before
reaching this line. Returning undef
in line 36 means that we won't
process any further request here.
That does it for the main program. The StressAgent
module is
presented in [listing two, below].
Line 1 turns on the compiler restrictions. Line 3 establishes the
package for this module. (For the use
directive to work, it should
be the same as the filename.) Line 4 brings in the Parallel UserAgent
module, and lines 5 and 6 cause objects created by this module to also
inherit from that module.
Lines 8 through 18 create the object constructor to make a new
StressAgent
object. The package name comes in first, shifted off
in line 9. Initially, we call the superclass's new
routine (in line
10), and then save the only parameter to this routine (a coderef) in
line 11. Lines 12 and 13 create two instance variables for this
object, storing them away in an instance variable named for this
package. Lines 14 through 16 call instance methods to initialize some
of the other parameters: setting a nice agent type, setting up any
needed proxy parameters, and also setting the maximum number of
parallel requests. Finally, line 17 returns the newly created
StressAgent object.
Lines 20 through 24 define an internal instance method that calls the
callback function (like &patient
in our example) with either no
parameters or three parameters. If the callback function returns
undef, nothing further happens, but if it instead returns
(hopefully) an HTTP::Request
object, then we put it into the queue
as another request to be executing in parallel.
Lines 26 through 35 handle each response as they come in. The Parallel
UserAgent will gather each request's response into a correct response
object, and when it's complete will call this method. The parameters
come in and get separated in line 28. For logging purposes, we dump
out the time of day, the response code, response message, and an
incrementing counter for each response. This gives me a quick way to
compute the number of transactions handled over a given period of time
by scrolling through the logfile. Each transaction handled increments
the instance variable did
, which gets printed as the number of
transactions we did.
Line 32 calls the callback function, passing it the three parameters of this completed response. Line 33 discards the memory of this transaction as far as Parallel UserAgent is concerned. This means that we aren't wasting memory trying to hang on to old data. Line 34 returns the object, mostly as a matter of convention because nothing else makes sense.
Lines 37 through 45 define the top-level event loop. The do_it
method is called with a number of parallel threads to maintain, and a
number of seconds to wait for a response. These default to 10 and 15,
respectively. The initial threads are created by calling the callback
routine with no parameters. Line 44 is basically an infinite loop,
because the Parallel UserAgent will cause all requests to be run to
completion; however, each request triggers a new request!
Line 47 ends this module file, so that the implicit require
inside
a use
directive will see a true value during the operation.
So, there's the program and the module. The module should be placed in
a file named StressAgent.pm
in the same directory as the script,
unless you alter the use lib
strategy above. Obviously, stress
testing my web server's Eliza script won't be of much use to you,
so you should start by figuring out what you want to do when each
response comes back, and coding the response callback function
appropriately.
This should give you a model for how to test your CGI scripts. You
should use test harnesses like this with all your scripts, especially
those that use shared resources. Eventually, the functionality of
Parallel::UserAgent
will be folded into the normal LWP library,
making it easier to write code like this. Until then, at least we have
a way of doing it. Enjoy!
Listings
=0= ##### listing one ##### =1= #!/home/merlyn/bin/perl -w =2= use strict; =3= =4= $|++; =5= =6= use FindBin qw($Bin); use lib $Bin; use StressAgent; =7= use HTTP::Request::Common; =8= =9= ## use LWP::Debug qw(+); # DEBUG =10= =11= my @YOW = do { =12= local (@ARGV, $/) = "/usr/local/share/emacs/20.2/etc/yow.lines"; =13= split /\0\s*/, <>; =14= }; =15= shift @YOW; # first one is a comment =16= my %ASKED = (); =17= =18= StressAgent->new(\&patient)->do_it(5); =19= =20= sub patient { =21= my ($request,$response,$entry) = @_; =22= return GET "http://www.stonehenge.com/cgi/eliza2" unless $request; =23= if ($response->code == 200) { =24= ## print $response->as_string; return; ## debug =25= my @content = $response->content =~ /(.*)\n?/g; =26= my ($url) = $content[2] =~ /action="(.*?)"/i; =27= my ($prompt) = $content[3] =~ /<p>(.*?)<\/p>/i; =28= for ($ASKED{$url} || "[initial question]", " -> ", $prompt) { =29= print map " $_\n", split /\n/; =30= } =31= my $yow = $YOW[rand @YOW]; =32= $ASKED{$url} = $yow; =33= return POST $url, [message => $yow]; =34= } =35= print $response->as_string; =36= return; =37= } =0= ##### listing two ##### =1= use strict; =2= =3= package StressAgent; =4= use LWP::Parallel::UserAgent; =5= use vars qw(@ISA); =6= @ISA = qw(LWP::Parallel::UserAgent); =7= =8= sub new { =9= my $package = shift; =10= my $self = $package->SUPER::new(); =11= my $make_request = shift; =12= $self->{+__PACKAGE__}{make_request} = $make_request; =13= $self->{+__PACKAGE__}{did} = 0; =14= $self->agent(+__PACKAGE__ . " (".$self->agent.")"); =15= $self->env_proxy; =16= $self->max_req(1000); # never reached =17= $self; =18= } =19= =20= sub maybe_register { =21= my $self = shift; =22= my $new_request = $self->{+__PACKAGE__}{make_request}->(@_); =23= $self->register($new_request) if defined $new_request; =24= } =25= =26= sub on_return { =27= my $self = shift; =28= my ($request, $response, $entry) = @_; =29= print =30= localtime().": ", $response->code, ", ", $response->message, =31= " {",++$self->{+__PACKAGE__}{did}, "}\n"; =32= $self->maybe_register(@_); =33= $self->discard_entry($entry); =34= $self; =35= } =36= =37= sub do_it { =38= my $self = shift; =39= my $max = shift || 10; =40= my $wait = shift || 15; =41= for (1..$max) { =42= $self->maybe_register(); =43= } =44= $self->wait($wait); =45= } =46= =47= 1;