Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Download this listing! | ||
Web Techniques Column 23 (March 1998)
One of the interesting challenges in a connectionless protocol like the Web is maintaining a ``session'' or ``state'' information to allow multiple interactions as part of a larger ``conversation''.
In past columns, I've talked about doing this with external files, with mangled URLs, and with hidden information in forms. While these are all good solutions, they still require each new CGI invocation to somehow ``come up to speed'' about which conversation this particular transaction is a part. In this column, let's take a different tactic.
Instead of trying to ``leap'' the information from one program to another, let's keep a process alive during an entire conversation, letting it die when the conversation is over. That way, the information can stay in the dataspace of that process, and all we have to do is keep reconnecting to that same process somehow.
This sounds perhaps complicated, but is actually very simple using the HTTP::Daemon module in the LWP library that I first used in a column last year to create a web proxy server. In this case, we'll be launching a tiny web-server from a CGI script that will hold the state of the conversation, and then tell the browser to go talk to this mini-web-server instead of the main server.
For each ongoing conversation, this will then require an additional process, so this method works nicely only when you have only small number of ``conversations'' going on at the same time. In part two (next month), I'll revisit this to show how to do it with just one additional process.
As I was looking for something interesting to demonstrate a stateful conversation, I was pleasantly greeted by a new addition to the CPAN: a Perl implementation of Eliza!
From the manpage for Chatbot::Eliza:
-
This module implements the classic Eliza algorithm. The original Eliza program was written by Joseph Weizenbaum and described in the Communications of the ACM in 1967. Eliza is a mock Rogerian psychotherapist. It prompts for user input, and uses a simple transformation algorithm to change user input into a follow-up question. The program is desigend to give the appearance of understanding.
-
This program is a faithful implementation of the program described by Weizenbaum. It uses a simplified script language (devised by Charles Hayden). The content of the script is the same as Weizenbaum's.
So, I quickly grabbed Chatbot::Eliza, and used it as the basis for an interactive session with ``the doctor'', using a standard browser interface. And the result is in [listing one, below].
Lines 1 through 4 start nearly every program I write, turning on Taint checking, warnings, disabling buffered output, and enabling all compiler restrictions.
Line 5 pulls in the CGI module, importing the normal convenience subroutines, as well as parsing the CGI information into accessible data structures. (The CGI modules is in the standard library beginning with Perl version 5.004. If you have an older version of Perl, you should upgrade for security reasons.)
Lines 6 and 7 pull in the HTTP::Daemon and HTTP::Status modules, found in the LWP library. LWP is found in the CPAN (http://www.perl.com/CPAN/ or http://www.perl.org/CPAN/), but you probably already have it if you've been doing any web programming with Perl.
Line 8 pulls in the Eliza module described above. You'll almost certainly need to get this one from the CPAN (unless you're like me and daily install everything you can from the CPAN just to play with it).
Lines 10 and 11 are the only configuation constants for this program. The $HOST variable defines the name of this host for HTTP::Daemon to use. This was supposed to work automatically, but didn't for me. (And thus might not for you.) So, define it here.
And $TIMEOUT is the number of seconds a particular ``doctor'' will hang out waiting for further dialog from a particular ``patient''. If no further requests come within this time, the doctor vanishes. So, this is a number that's worth tuning with some thought. You'll need a separate process for each doctor, so you don't want a lot of idle doctors filling up the office. On the other hand, the doc needs to wait around long enough for the patient to come up with a meaningful response. I found 2 minutes (120 seconds) to be about right in some quick testing with my net friends. (Obviously, that number would be much too low in a shopping cart environment.)
Every invocation of this CGI program creates a separate mini-web-server. This happens by creating a new HTTP::Daemon object (in line 13), saved in $d. A non-predictable non-priveleged port number gets assigned automatically, and recycles after some 60,000 ports, so we're pretty safe for a while.
Line 14 creates a unique identifier that the CGI script and the
mini-web-server share secretly. This is used in later connections to
the mini-web-server to ensure that we don'thave someone trying to
hijack a particular doctor session. I'm not being very secure
here... since the value of rand()
can be predicted given the other two
parts of this URL, but this should scare off the newbies from trying
anything.
Line 15 constructs the URL to which we will send the browser. It'll look something like:
http://www.stonehenge.com:61714/881337239.25550.567
Note here the mini-web-server address at my web address, but port 61714, and then the unique key following of the time, PID, and a random 1-3 digit number. All further interaction from the browser will be to this URL, triggering the mini-web-server. Bookmarking this URL is pointless, because the server responding to it dies after two minutes of inactivity.
Lines 17 through 22 fork off the mini-web-server using fairly standard forking code. Line 19 causes the parent code (acting as the CGI handler) to redirect the browser to the mini-web-server. Line 22 closing STDOUT is essential. Without it, the real web server won't know that we're done sending information to the browser.
The remaining lines of the program are strictly for the mini-web-server. Line 24 creates the Eliza object, which has stateful information about our current session.
Lines 26 through 58 form an infinite loop, processing one request at a
time from a browser. This loop is exited via the alarm()
setting on
line 27. After 120 seconds of idleness waiting at line 28 for a new
connection, a SIGALRM signal will come along. Since we're not doing
anything to catch that signal, the program will hastily exit. When we
get back up to the top of the loop after handling a single
transaction, the alarm is reset once again to 120 seconds. (There's
only one alarm to be activated, so any new setting erases the prior
setting.)
Line 28 waits for a connection from a browser, and captures information about the connection into an HTTP::Daemon::ClientConn object (described on the manpage for HTTP::Daemon), here saved into $c.
Line 29 gets the request information into an HTTP::Request object, saved into $r. This will have the POSTed data content in it, as we'll see shortly.
Lines 30 through 34 are the security check to make sure that it isn't
just some random connection, or even a bookmarked prior connection
from long ago. The vaue of $r->url->epath
is the ``path''
requested of this ``web server''. Since the URL we redirected the
browser toward has our magic unique identifier out there, this better
be the same.
If it's not, lines 31 and 32 trigger an HTTP error (the 403 message), including a cute text phrase. If you're experimenting and want to see this message, alter the URL slightly after you've invoked the program in the normal way.
If we've made it to line 35, we're doing normal stuff now, so it's time to send back a normal ``here's your file'' status.
Line 36 is a bit odd. I want to use the cool HTML-writer features of
the CGI module, but this code isn't being called from a real web
server, and so the CGI environment variables aren't set up properly.
Luckily, the CGI module allows me to fake up a transaction, passing it
the input data. But, in order to use the function calls instead of
the method invocations for the HTML writer features, the current CGI
object has to be installed in $CGI::Q. And that's exactly what I've
done. Yes, I cheated... looking at the source of CGI.pm to figure out
how to do this, but it works. Here, $r->content
is the POST
data from the previous form. If it's empty, it was a GET request, but
that doesn't matter here.
Line 37 is the response that the doctor speaks. I'm defaulting it to an initial greeting, which will be used if we didn't get any valid prior response.
Lines 38 to 42 handle the replies from prior invocations. The
param method/function fetches the message
parameter from the
form data. Unless it's empty or missing, we pass it into the Eliza
object, whose response comes back into $eliza_says. Also, we clear
out the parameter, so that it won't be the default in the form below.
(Forms created with the CGI module are ``sticky'' by default.)
Lines 43 to 55 return back the content of the response to the user. Normally, this is handled by printing to STDOUT in a CGI script, but we're not quite in full CGI mode here. In fact, it's sorta ``no-G-I''. Luckily, the $c connection object can act like a filehandle, so we use it as an indirect filehandle to print the return information.
Line 44 generates the HTTP header (everything after the HTTP status
line printed above). Line 45 provides the initial HTML content. Line 46
generates an H1
header, and line 47 an HR
horizontal rule.
Lines 48 through 53 generate the response form. Note that we force a
POST to our magic unique URL as the action for this form. By HTTP
standards, this must be a POST and not a GET, because submitting
this form changes the state of the server (in this case, our
mini-web-server). Also, by forcing a POST, the response ends up
accessible to the ->content
method above. (To get the form
data of a GET method, see the HTTP::Daemon manpage for an example.)
Line 49 formats the doctor's response immediately above the textfield generated in line 50. Line 51 creates a submit button (with a text message in place of SUBMIT), and line 52 tells them that the meter is running.
After the content has been sent down the $c pipe, line 56 closes the connection, letting the browser know we are done. And then we're ready for another interaction, so we jump back in line 57 to do it all over again.
And that's all there is to it. Install it into a CGI area, and tell people that the doctor is in at a URL like
http://www.stonehenge.com/cgi/eliza
and then you'll be wishing you'd implemented that credit-card charge system so you could get paid for this. Note that only the initial fetch of the CGI script will show up in your web logs, because all subsequent HTTP transactions are being performed to the mini-web-servers. Enjoy!
Listing One
=1= #!/home/merlyn/bin/perl -Tw =2= =3= $|++; =4= use strict; =5= use CGI ":standard"; =6= use HTTP::Daemon; =7= use HTTP::Status; =8= use Chatbot::Eliza; =9= =10= my $HOST = "www.stonehenge.com"; # where are we? =11= my $TIMEOUT = 120; # number of seconds until this doc dies =12= =13= my $d = new HTTP::Daemon (LocalAddr => $HOST); =14= my $unique = join ".", time, $$, int(rand 1000); =15= my $url = $d->url.$unique; =16= =17= defined(my $pid = fork) or die "Cannot fork: $!"; =18= if ($pid) { # I am, apparently, the parent =19= print redirect($url); =20= exit 0; =21= } =22= close(STDOUT); # to let the kid live on =23= =24= my $eliza = new Chatbot::Eliza; =25= =26= { =27= alarm($TIMEOUT); # (re-)set the deadman timer =28= my $c = $d->accept; # $c is a connection =29= my $r = $c->get_request; # $r is a request =30= if ($r->url->epath ne "/$unique") { =31= $c->send_error(RC_FORBIDDEN, "I don't think we've made an appointment!"); =32= close $c; =33= redo; =34= } =35= $c->send_basic_header; =36= $CGI::Q = new CGI $r->content; =37= my $eliza_says = "How do you do? Please tell me your problem."; =38= my $message = param("message") || ""; =39= if ($message) { =40= param("message",""); =41= $eliza_says = $eliza->transform($message); =42= } =43= print $c =44= header, =45= start_html("The doctor is in!"), =46= h1("The doctor is in!"), =47= hr, =48= startform("POST", $url), =49= p($eliza_says), =50= p, textfield(-name => "message", -size => 60), =51= p, submit("What do you say, doc?"), =52= p("Note: the doctor is patient, but waits only $TIMEOUT seconds, so hurry!"), =53= endform, =54= hr, =55= end_html; =56= close $c; =57= redo; =58= }