Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 23 (March 1998)

One of the interesting challenges in a connectionless protocol like the Web is maintaining a ``session'' or ``state'' information to allow multiple interactions as part of a larger ``conversation''.

In past columns, I've talked about doing this with external files, with mangled URLs, and with hidden information in forms. While these are all good solutions, they still require each new CGI invocation to somehow ``come up to speed'' about which conversation this particular transaction is a part. In this column, let's take a different tactic.

Instead of trying to ``leap'' the information from one program to another, let's keep a process alive during an entire conversation, letting it die when the conversation is over. That way, the information can stay in the dataspace of that process, and all we have to do is keep reconnecting to that same process somehow.

This sounds perhaps complicated, but is actually very simple using the HTTP::Daemon module in the LWP library that I first used in a column last year to create a web proxy server. In this case, we'll be launching a tiny web-server from a CGI script that will hold the state of the conversation, and then tell the browser to go talk to this mini-web-server instead of the main server.

For each ongoing conversation, this will then require an additional process, so this method works nicely only when you have only small number of ``conversations'' going on at the same time. In part two (next month), I'll revisit this to show how to do it with just one additional process.

As I was looking for something interesting to demonstrate a stateful conversation, I was pleasantly greeted by a new addition to the CPAN: a Perl implementation of Eliza!

From the manpage for Chatbot::Eliza:

This module implements the classic Eliza algorithm. The original Eliza program was written by Joseph Weizenbaum and described in the Communications of the ACM in 1967. Eliza is a mock Rogerian psychotherapist. It prompts for user input, and uses a simple transformation algorithm to change user input into a follow-up question. The program is desigend to give the appearance of understanding.

This program is a faithful implementation of the program described by Weizenbaum. It uses a simplified script language (devised by Charles Hayden). The content of the script is the same as Weizenbaum's.

So, I quickly grabbed Chatbot::Eliza, and used it as the basis for an interactive session with ``the doctor'', using a standard browser interface. And the result is in [listing one, below].

Lines 1 through 4 start nearly every program I write, turning on Taint checking, warnings, disabling buffered output, and enabling all compiler restrictions.

Line 5 pulls in the CGI module, importing the normal convenience subroutines, as well as parsing the CGI information into accessible data structures. (The CGI modules is in the standard library beginning with Perl version 5.004. If you have an older version of Perl, you should upgrade for security reasons.)

Lines 6 and 7 pull in the HTTP::Daemon and HTTP::Status modules, found in the LWP library. LWP is found in the CPAN (http://www.perl.com/CPAN/ or http://www.perl.org/CPAN/), but you probably already have it if you've been doing any web programming with Perl.

Line 8 pulls in the Eliza module described above. You'll almost certainly need to get this one from the CPAN (unless you're like me and daily install everything you can from the CPAN just to play with it).

Lines 10 and 11 are the only configuation constants for this program. The $HOST variable defines the name of this host for HTTP::Daemon to use. This was supposed to work automatically, but didn't for me. (And thus might not for you.) So, define it here.

And $TIMEOUT is the number of seconds a particular ``doctor'' will hang out waiting for further dialog from a particular ``patient''. If no further requests come within this time, the doctor vanishes. So, this is a number that's worth tuning with some thought. You'll need a separate process for each doctor, so you don't want a lot of idle doctors filling up the office. On the other hand, the doc needs to wait around long enough for the patient to come up with a meaningful response. I found 2 minutes (120 seconds) to be about right in some quick testing with my net friends. (Obviously, that number would be much too low in a shopping cart environment.)

Every invocation of this CGI program creates a separate mini-web-server. This happens by creating a new HTTP::Daemon object (in line 13), saved in $d. A non-predictable non-priveleged port number gets assigned automatically, and recycles after some 60,000 ports, so we're pretty safe for a while.

Line 14 creates a unique identifier that the CGI script and the mini-web-server share secretly. This is used in later connections to the mini-web-server to ensure that we don'thave someone trying to hijack a particular doctor session. I'm not being very secure here... since the value of rand() can be predicted given the other two parts of this URL, but this should scare off the newbies from trying anything.

Line 15 constructs the URL to which we will send the browser. It'll look something like:

        http://www.stonehenge.com:61714/881337239.25550.567

Note here the mini-web-server address at my web address, but port 61714, and then the unique key following of the time, PID, and a random 1-3 digit number. All further interaction from the browser will be to this URL, triggering the mini-web-server. Bookmarking this URL is pointless, because the server responding to it dies after two minutes of inactivity.

Lines 17 through 22 fork off the mini-web-server using fairly standard forking code. Line 19 causes the parent code (acting as the CGI handler) to redirect the browser to the mini-web-server. Line 22 closing STDOUT is essential. Without it, the real web server won't know that we're done sending information to the browser.

The remaining lines of the program are strictly for the mini-web-server. Line 24 creates the Eliza object, which has stateful information about our current session.

Lines 26 through 58 form an infinite loop, processing one request at a time from a browser. This loop is exited via the alarm() setting on line 27. After 120 seconds of idleness waiting at line 28 for a new connection, a SIGALRM signal will come along. Since we're not doing anything to catch that signal, the program will hastily exit. When we get back up to the top of the loop after handling a single transaction, the alarm is reset once again to 120 seconds. (There's only one alarm to be activated, so any new setting erases the prior setting.)

Line 28 waits for a connection from a browser, and captures information about the connection into an HTTP::Daemon::ClientConn object (described on the manpage for HTTP::Daemon), here saved into $c.

Line 29 gets the request information into an HTTP::Request object, saved into $r. This will have the POSTed data content in it, as we'll see shortly.

Lines 30 through 34 are the security check to make sure that it isn't just some random connection, or even a bookmarked prior connection from long ago. The vaue of $r->url->epath is the ``path'' requested of this ``web server''. Since the URL we redirected the browser toward has our magic unique identifier out there, this better be the same.

If it's not, lines 31 and 32 trigger an HTTP error (the 403 message), including a cute text phrase. If you're experimenting and want to see this message, alter the URL slightly after you've invoked the program in the normal way.

If we've made it to line 35, we're doing normal stuff now, so it's time to send back a normal ``here's your file'' status.

Line 36 is a bit odd. I want to use the cool HTML-writer features of the CGI module, but this code isn't being called from a real web server, and so the CGI environment variables aren't set up properly. Luckily, the CGI module allows me to fake up a transaction, passing it the input data. But, in order to use the function calls instead of the method invocations for the HTML writer features, the current CGI object has to be installed in $CGI::Q. And that's exactly what I've done. Yes, I cheated... looking at the source of CGI.pm to figure out how to do this, but it works. Here, $r->content is the POST data from the previous form. If it's empty, it was a GET request, but that doesn't matter here.

Line 37 is the response that the doctor speaks. I'm defaulting it to an initial greeting, which will be used if we didn't get any valid prior response.

Lines 38 to 42 handle the replies from prior invocations. The param method/function fetches the message parameter from the form data. Unless it's empty or missing, we pass it into the Eliza object, whose response comes back into $eliza_says. Also, we clear out the parameter, so that it won't be the default in the form below. (Forms created with the CGI module are ``sticky'' by default.)

Lines 43 to 55 return back the content of the response to the user. Normally, this is handled by printing to STDOUT in a CGI script, but we're not quite in full CGI mode here. In fact, it's sorta ``no-G-I''. Luckily, the $c connection object can act like a filehandle, so we use it as an indirect filehandle to print the return information.

Line 44 generates the HTTP header (everything after the HTTP status line printed above). Line 45 provides the initial HTML content. Line 46 generates an H1 header, and line 47 an HR horizontal rule.

Lines 48 through 53 generate the response form. Note that we force a POST to our magic unique URL as the action for this form. By HTTP standards, this must be a POST and not a GET, because submitting this form changes the state of the server (in this case, our mini-web-server). Also, by forcing a POST, the response ends up accessible to the ->content method above. (To get the form data of a GET method, see the HTTP::Daemon manpage for an example.)

Line 49 formats the doctor's response immediately above the textfield generated in line 50. Line 51 creates a submit button (with a text message in place of SUBMIT), and line 52 tells them that the meter is running.

After the content has been sent down the $c pipe, line 56 closes the connection, letting the browser know we are done. And then we're ready for another interaction, so we jump back in line 57 to do it all over again.

And that's all there is to it. Install it into a CGI area, and tell people that the doctor is in at a URL like

        http://www.stonehenge.com/cgi/eliza

and then you'll be wishing you'd implemented that credit-card charge system so you could get paid for this. Note that only the initial fetch of the CGI script will show up in your web logs, because all subsequent HTTP transactions are being performed to the mini-web-servers. Enjoy!

Listing One

        =1=     #!/home/merlyn/bin/perl -Tw
        =2=     
        =3=     $|++;
        =4=     use strict;
        =5=     use CGI ":standard";
        =6=     use HTTP::Daemon;
        =7=     use HTTP::Status;
        =8=     use Chatbot::Eliza;
        =9=     
        =10=    my $HOST = "www.stonehenge.com"; # where are we?
        =11=    my $TIMEOUT = 120;              # number of seconds until this doc dies
        =12=    
        =13=    my $d = new HTTP::Daemon (LocalAddr => $HOST);
        =14=    my $unique = join ".", time, $$, int(rand 1000);
        =15=    my $url = $d->url.$unique;
        =16=    
        =17=    defined(my $pid = fork) or die "Cannot fork: $!";
        =18=    if ($pid) {                     # I am, apparently, the parent
        =19=      print redirect($url);
        =20=      exit 0;
        =21=    }
        =22=    close(STDOUT);                  # to let the kid live on
        =23=    
        =24=    my $eliza = new Chatbot::Eliza;
        =25=    
        =26=    {
        =27=      alarm($TIMEOUT);              # (re-)set the deadman timer
        =28=      my $c = $d->accept;           # $c is a connection
        =29=      my $r = $c->get_request;      # $r is a request
        =30=      if ($r->url->epath ne "/$unique") {
        =31=        $c->send_error(RC_FORBIDDEN, "I don't think we've made an appointment!");
        =32=        close $c;
        =33=        redo;
        =34=      }
        =35=      $c->send_basic_header;
        =36=      $CGI::Q = new CGI $r->content;
        =37=      my $eliza_says = "How do you do?  Please tell me your problem.";
        =38=      my $message = param("message") || "";
        =39=      if ($message) {
        =40=        param("message","");
        =41=        $eliza_says = $eliza->transform($message);
        =42=      }
        =43=      print $c
        =44=        header,
        =45=        start_html("The doctor is in!"),
        =46=        h1("The doctor is in!"),
        =47=        hr,
        =48=        startform("POST", $url),
        =49=        p($eliza_says),
        =50=        p, textfield(-name => "message", -size => 60),
        =51=        p, submit("What do you say, doc?"),
        =52=        p("Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!"),
        =53=        endform,
        =54=        hr,
        =55=        end_html;
        =56=      close $c;
        =57=      redo;
        =58=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.