Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 24 (April 1998)

Last month, we looked at how to maintain a stateful ``conversation'', using a child process per ``session''. While this is easy to do (thanks to the HTTP::Daemon module in LWP), it can get rather expensive if we're trying to maintain 15 or 20 conversations.

In this column, I'll show a different approach, still implementable in a small number of code lines, but with its own advantages and disadvantages. In particular, what we're gonna do this time is create a single multithreaded mini-web-server, to handle all the conversations in parallel for a particular web page.

This approach has the advantage in that it requires minimal resources on the server machine. However, it doesn't scale well to many hundreds of simultaneous conversations. For that, a solution like mod_perl embedded inside the Apache server is probably a better deal.

However, one advantage is that a single process is sharing the collective state of all transactions. Using this technology as a base, you could build a nice ``chat'' server, where not only are we maintaining individual threads, but the threads interact with each other. (Perhaps a topic for a future column... stay tuned.)

As in the previous column, I'm using the ``Eliza'' module to demonstrate a non-trivial but easily viewable interaction that requires state. One nice thing about this module is that the computation time per form submission is rather small. Because the mini-web-server can handle only one submission at a time, all other clients will be blocked until the computation is complete. This is another limiting factor for this approach.

The ``one doctor, many faces'' approach is presented in [Listing One, below].

Lines 1 through 4 begin most of my Perl programs, and should be familiar by now if you've been reading any of them. We're turning on warnings, taint checks, unbuffering STDOUT, and enabling the best compiler restrictions.

Lines 5 through 8 bring in the modules we'll be using. The CGI module is standard as of Perl 5.004, while HTTP::Daemon and HTTP::Status are both part of the LWP library, found in the CPAN (at http://www.perl.com/CPAN). Similarly, Chatbot::Eliza is also found in the CPAN, and provides us with the pseudo-doctor.

Lines 10 through 12 define the configuration constants for the solitary mini-web-server. The hostname shouldn't be needed, but I was getting errors without it. (Your mileage may vary.) And unlike the previous version, we'll now need to define a specific port for the server, since all sessions are sharing the same server. You must ensure that this port is used only by this server, so some coordination with other potential port-users is definitely in order.

The timeout value can be higher than previous version, because we're only talking about one process, not many. Make sure it's high enough to keep from forking too often, but low enough that we don't have a lot of wasted resources keeping that unused process alive.

Lines 14 through 17 attempt to connect a new web server to the designated port. If $d comes back valid, then no-one was listening to the port, and we must launch the mini-web-server. If $d is undef, then we can talk to an existing mini-web-server, launched in the past few minutes by another invocation of this same script. I've turned warnings off inside the block, so that we don't get a spurious ``cannot bind'' error. We know that this will fail most of the time.

Lines 18 through 20 create a session key, which will be used by the mini-web-server to recognize the same client connecting over time. Again, you want enough stuff in here to make it hard to predict the next session ID, so that someone doesn't try to hijack the session.

Line 22 generates an external redirect for the client: the browser will automatically be forwarded to the new host and port address. Strangely enough, the browser may be attempting to connect to a port that doesn't really have a web server on it yet, but will in a moment. This works just fine, however, since we've already bound the port above.

Line 23 is a critical decision maker. If the bind to the designated port above succeeded, there was not a prior web server waiting on that port, and we must spawn the mini-web-server. If the bind failed, we must not spawn the web-server, or it would just be wasted resources. So, if $d is anything but undef, it's time to launch the server.

If we make it to line 25, we are still talking to the browser, even though we've sent out an external redirect. To ensure that the browser gets released, we must fork and let the parent close down the browser connection. After line 25, we'll have two processes running (but only momentarily).

If $pid is non-zero, we're the process that wants to say ``goodbye'' to the browser connection, so we simply exit in line 26. If not, then we need to disconnect from STDOUT in line 27, or else the lingering open will cause the web-server that started us to keep thinking that we have more to say. This would be bad.

Lines 29 and 30 create the global variables for the mini-web-server. The %eliza hash will hold all the ``doctor'' objects, keyed by the session ID. The %when hash holds the time of the last transaction with a particular client, again keyed by session ID. (Another way to do this would have been a single hash with session ID keys and nested data structures to hold both the doctor object and time stamp, but it would have obfuscated the main algorithm, so I did it this way instead.)

Lines 32 through 77 form the endless loop for the mini-web-server. Line 76 causes the loop to be restarted repeatedly. The exit from this loop is defined by the alarm set in line 33. As long as we make it up to the top of the loop within $TIMEOUT seconds, the alarm never fires, and we'll just keep looping around. If there's no input, then the alarm fires, and we take a quick SIGALRM exit without much pain. (This technique was also used in last month's column.)

Line 34 is where we spend a lot of time waiting. We'll block here until a connection comes in from any of the many clients that were told to come here. Once a connection is made, $c holds the object representing that connection (a derived class from an IO::Socket), and we can start figuring out what to do with that contact.

Line 35 takes this new connection and fetches an entire HTTP request based on the data being provided according to protocol. Line 37 extracts the session ID information from the requested URL. Remember that to the browser, we're simply another web server, and it thinks it's just sending a normal web page. However, we're using the name of the ``web page'' as the session ID. Learn to think sideways about the protocols, and you'll be able to figure out new techniques like this, even though we're working entirely within the ``standard'' HTTP specifications.

Lines 39 through 43 identify browsers that have gotten here by a redirection from the top part of the script versus random connections. This check is rather simple. For more robust rejections, you could have some hashed value that depends on a secret shared between the top half and bottom half of the script. But, we're not too concerned about people hijacking this doctor, provided they don't get into someone else's current session.

Line 44 sends back a basic HTTP header, needed because we are operating essentially in ``NPH mode''. Line 45 extracts the POST content from the request, and uses it to build a CGI object. We're also forcing this object into the CGI module's global variable, so that the remaining functions can be called against this object.

Lines 46 through 53 compute the response. First, line 46 selects a default response in absence of input, and line 47 attempts to scrounge up that input (if any). If there was some input, we need to talk to the doctor about it, beginning in line 49. First the prior message is cleared out (we don't want to default to the same thing they just said), and then we create a new doctor for this session ID if necessary. Line 51 then processes the user input to get the doctor's response. Line 52 notes that we have had a interaction with this session, by recording the wall-clock time as a Unix timestamp into the %when hash.

Lines 54 through 69 generate a standard form response, the same as with a non-mini-web-server. However, we must print the response to $c (as in last month's version), because STDOUT is no longer anything useful. Line 70 closes down the connection, and the browser is thus free to display the message.

Now, unlike last month's program, we'll need to do some cleaning up afterwards. Once we've done a transaction, it's time to see if any of the doctors can be excused, because their respective patients have gone away. This is handled in lines 71 through 75. For each of the hash items, we'll blow away the %eliza and %when elements when the most recent transaction for that item has not been within the $TIMEOUT window.

To see why this is necessary, imagine a new patient coming along every 4 minutes, speaking once or twice, and then leaving, for an entire day. The first patient will cause a new ``clinic'' to be created. But the end of the day, we've still got the same single mini-web-server, because the alarm has never fired. Without this cleanup, we'd have records on 500 or so patients that'll never be back.

So, that's how to do it with just one additional process. Stick it onto your CGI bin, and have fun talking to the doctor. (And review last month's column for comparison!) Enjoy!

Listing One

        =1=     #!/home/merlyn/bin/perl -Tw
        =2=     
        =3=     $|++;
        =4=     use strict;
        =5=     use CGI ":all";
        =6=     use HTTP::Daemon;
        =7=     use HTTP::Status;
        =8=     use Chatbot::Eliza;
        =9=     
        =10=    my $HOST = "www.stonehenge.com"; # where are we?
        =11=    my $PORT = 42001;               # at what port
        =12=    my $TIMEOUT = 300;              # number of seconds until this doc dies
        =13=    
        =14=    my $d = do {
        =15=      local($^W) = 0;
        =16=      new HTTP::Daemon (LocalAddr => $HOST, LocalPort => $PORT)
        =17=    };
        =18=    my $unique = join ".", time, $$, int(rand 1000);
        =19=    my $url_prefix = "http://$HOST:$PORT";
        =20=    my $url = "$url_prefix/$unique";
        =21=    
        =22=    print redirect($url);
        =23=    exit 0 unless defined $d;       # do we need to become the server?
        =24=    
        =25=    defined(my $pid = fork) or die "Cannot fork: $!";
        =26=    exit 0 if $pid;                 # I am the parent
        =27=    close(STDOUT);
        =28=    
        =29=    my %eliza;                      # Chatbot::Eliza objects, keyed on session
        =30=    my %when;                       # most recent activity time, keyed on session
        =31=    
        =32=    {
        =33=      alarm($TIMEOUT);              # (re-)set the deadman timer
        =34=      my $c = $d->accept;           # $c is a connection
        =35=      my $r = $c->get_request;      # $r is a request
        =36=    
        =37=      (my $session = $r->url->epath) =~ s{^/}{};
        =38=      
        =39=      unless ($session =~ /^\d+\.\d+\.\d+$/) {
        =40=        $c->send_error(RC_FORBIDDEN, "I don't think we've made an appointment!");
        =41=        close $c;
        =42=        redo;
        =43=      }
        =44=      $c->send_basic_header;
        =45=      $CGI::Q = new CGI $r->content;
        =46=      my $eliza_says = "How do you do?  Please tell me your problem.";
        =47=      my $message = param("message") || "";
        =48=      if ($message) {
        =49=        param("message","");
        =50=        $eliza{$session} ||= new Chatbot::Eliza;
        =51=        $eliza_says = $eliza{$session}->transform($message);
        =52=        $when{$session} = time;
        =53=      }
        =54=      print $c
        =55=        header,
        =56=        start_html("The doctor is in!"),
        =57=        h1("The doctor is in!"),
        =58=        p("This script is from a future",
        =59=          a({-HREF => "http://www.stonehenge.com/merlyn/WebTechniques/";},
        =60=            cite("Web Techniques")," Perl column.")),
        =61=        hr,
        =62=        startform("POST", "$url_prefix/$session"),
        =63=        $eliza_says && p($eliza_says),
        =64=        p, textfield(-name => "message", -size => 60),
        =65=        p, submit("What do you say, doc?"),
        =66=        p("Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!"),
        =67=        endform,
        =68=        hr,
        =69=        end_html;
        =70=      close $c;
        =71=      for (keys %when) {
        =72=        next if $when{$_} > time - $TIMEOUT;
        =73=        delete $eliza{$_};
        =74=        delete $when{$_};
        =75=      }
        =76=      redo;
        =77=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.