Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 56 (Dec 2000)

[suggested title: Poor man's webchat]

Last month, I introduced the idea of a ``poor man's load balancer'', by producing a short program that would handle the basics, without necessarily getting into all the frills and whistles and gadgets of the full-blown solutions. This month, I decided to take on the ``web chat'' arena in the same way.

This came about in part because of a conversation we columnists had by email with the magazine staff, regarding this month's ``theme'' of ``collaboration''. Usually, I ignore the theme and write about whatever I want, but one of the suggestions was to write a ``web chat''.

By email, I mentioned that I could take on the chatter, and one of the other columnists said he had already written a 200-line chat server and 100-line chat client in Perl for another publication, and was happy not to redo that for this magazine. Another columnist mentioned that he had also done a webchat in some unspecified number of lines (probably far exceeding 300 total) in Java. Well, that gave me the challenge I needed: to write a webchat in under 100 lines. I succeeded.

Not only did I succeed, it turns out nice enough that a few of the people I tested it with were clamoring for the source code, which I present in [listing one below]. And as for features, well, there's no JavaScript of any flavor, and it works with any browser that supports frames and client-side pull (meta refresh). I even tested this with the w3-mode in GNU Emacs, and it works nearly fine, except that I need to manually refresh the viewing window. Cool.

Not only that, but anything that looks like a URL in the chat text is automatically linked to the real URL, including most of the forms understood by those browsers-masquerading-as-newsreaders, thanks to the URI::Find module from the CPAN.

The chatserver is actually a mini-webserver. The first CGI hit after installation forks off a separate Perl process listening on its own port for HTTP requests. Browsers involved with chatting are actually getting data directly from this mini-webserver. Because a single process is involved, there's no problem ``sharing'' the data between the various chatting users. When the chat webserver becomes idle, it disappears, thus consuming no resources when not needed.

So, without further delay, let's look at this marvel. Line 1 turns on taint checking and warnings. Line 2 disables the buffering on STDOUT. Line 3 turns on the usual compiler restrictions for programs over 10 lines long (no softrefs, no barewords, and most importantly, no undeclared variables).

Lines 4 through 7 bring in the modules. CGI is standard with any modern Perl distribution. HTTP::Daemon and HTTP::Status are both part of the LWP distribution in the CPAN: a must-have if you're doing any sort of web stuff. And URI::Find is also found (separately) in the CPAN.

Lines 9 through 16 define the configuration items for this program. While my programs are not generally meant as ``ready to run'', this program does indeed work out of the box, and the things I most wanted to tweak have been placed here in the front.

A port number must be designated in line 10. This port should be dedicated on the webserver to this chat program, so pick something high (between 40000 and 60000) and out of the way. Since only one program can be attached to a port at a time, this ensures easily that only one copy of the chatserver (a mini-web-server) will be running.

Line 11 defines the timeout (in seconds) of how long before the chatserver shuts down. In normal chatting, each browser will be issuing a client pull at least every 60 seconds from the server, so it'll stay alive. When everyone goes away, so does the server.

Line 12 defines the historical time for individual chat messages. This is useful so that a screen isn't cluttered with very old messages, so a quick glance at a chat window can see if there's any recent activity. It can exceed the $TIMEOUT time because any active browser keeps the server alive, even if messages aren't being sent.

Lline 13 defines the maximum number of lines in the chat viewing window. Too small, and an adequate conversation is impossible. Too large, and the viewing window starts scrolling. I tried putting the newest messages at the top so that even a scrolled window was OK, and it was too unnatural to read the messages from bottom to top to catch up. So, just keep the count low, and you won't scroll.

Line 14 defines the maximum length for an identifying name, while line 15 similarly defines the length of a message line. Again, there's a tradeoff. If you permit long lines, things will scroll more often, but people can get their message said in fewer lines more often.

Line 18 fetches our web host from a CGI environment variable provided by the web server.

Lines 20 through 23 attempt to set up a miniwebserver at the designated host and port. If it fails, we'll get a warning (hence the warnings turned off), and we know that an existing webserver is there ready to serve. If it succeeds, we must fork and become the web server, noted later. Line 24 computes the URL for this webserver, in either case.

Lines 26 through 33 generate the response to visiting the CGI script. Line 26 in particular dumps an HTTP header, defining the rest of the content as HTML. Lines 28 to 33 dump the required frameset, which unfortunately cannot be generated using HTML shortcuts. In particular, the start_html shortcut generates BODY tags, which won't work with a frameset. So, manually, I type the first less-thans and greater-thans in my CGI script in a very long time. It feels so retro. The frameset is a 75/25 split, with the reading window on top, and the writing window on the bottom. I don't actually refer to the frame names, but put them in there for future reference.

The URL for the reading frame and writing frame are both pointing at the mini-webserver. The read10 URL creates a default refresh rate of once each 10 seconds, which can be changed later.

In line 35, we're done if we didn't manage to create the daemon socket, which means an existing server is out there ready to fulfill the frame stuff. However, if not, we must continue and become the server.

Line 37 forks us, which means we can now go in two different directions at once. We have to do this because the webserver that invoked us needs to see the original process exit before it releases the browser. But we want to continue past that. Hence a fork. The parent process exits in line 38, and the child must close STDOUT in line 39, and that satisfies the webserver.

So, from here on, we've got us a mini-webserver. One process for all the browsers to talk with, listening at port $PORT. Line 41 provides the @CHAT data structure. New messages are put on at the end, while old messages are shifted off the front.

Lines 42 to 86 provide the main execution loop. Line 43 is the deadman timer. If we don't make it up here at least every $TIMEOUT seconds, we're dead. Lines 44 to 46 get a valid web connection and request from a browser into $r. The ``not sure why I need this'' is that I noticed in heavy use that I'd sometimes get undef in either $c or $r, and rather than diagnose it, I simply retry. Solved.

Line 48 extracts the path being requested from this webserver. That's where we find out if it was a write or a read request. Line 49 sends a basic HTTP status response header. Line 50 extracts the POST content (even if it was a GET, but we don't care much, to keep it short).

Line 52 prints the HTTP header, including the text/html content type. Lines 53 through 65 handle the ``read'' window, while lines 66 to 80 handle the ``write'' window.

Line 54 displays the all-important meta-refresh request, asking for the browser to refresh this frame (and this frame only) every $secs seconds. Initially, that'll be 10, but it can be changed by hitting one of the links generated in lines 56 to 58. The links also succeed in forcing a manual refresh. Nice.

Lines 60 and 61 take care of tossing the old messages, whether there are too many, or they're too old.

Lines 62 through 64 do a massive transformation, from the N-by-3 matrix of old messages into a nice N-by-2 table for display. I'm not heavy into fancy HTML design, but the basic no-border look was nice for this project, as was the designated cellspacing and cellpadding.

The map in the middle makes rows of each two-cell row. The first cell is derived from the timestamp and the selected username. The username is HTML-escaped to prevent scary people from including less-thans and greater-thans and entities from hosing up the display. The timestring is also substringed so that we get just the 24-hour-time including the seconds. And for the fun part, the message is passed through the fixer with a flag that says that URIs should be made linkable. More on that later.

If it's a write we've got, we may have a message to add to the queue. Lines 67 and 68 grab the CGI parameters from the form we're about to display, and if they're there, we rip out all the low control characters, and truncate them to a maximum acceptable length in lines 69 through 71. If there's anything left to them in line 72, they end up on the end of the message list, to be displayed into one or more browsers on the next read pull. Lines 75 to 79 dump out a simple form to grab the username and message. The username field is sticky, while the message field starts out empty each time, which seems to make a lot of sense here. Note that on some browsers, hitting return in either field is enough to send the form, while on others, we need to press the submit button.

Lines 82 and 84 end the page display (either for the read window or the write window), and close the connection, so the browser knows we're done. Line 85 starts the loop over again.

Lines 88 to 95 fix up a string for either HTML entity escaping, and possibly URL linking. Line 89 performs a simple HTML-escaping and returns if that's all that's desired. Lines 91 and 92 call the find_uris subroutine (from URI::Find), marking each URI with control-A (\001) marks around the parts that must be protected from HTML entity fixups. Line 93 carefully HTML entitizes the rest of the string, skipping over any of the marked parts. It took me a while to get this right. Note that the links are also marked to come up into a separate browser window, because we don't want them in place of the read window: we'll never get back correctly. Line 94 returns the patched-up string.

And there you have it. Just 5 lines short of a 100, thus winning the satisfaction that I could do what I set out to do. In those 5 more lines, I might have added email address finding (using Email::Find from the CPAN) or perhaps controlled the number of scrollback lines displayed (by using a URL like /read10.15 to encode both numbers). Or perhaps put some slight amount of security to have the usernames be registered with an email registration system, or even at least display the source internet addresses. All good suggestions that came from my beta testing period.

But, we'll leave that for another day. If you write the code, let me know. After all, it's all about collaboration. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -Tw
        =2=     $|++;
        =3=     use strict;
        =4=     use CGI qw(:standard escapeHTML);
        =5=     use HTTP::Daemon;
        =6=     use HTTP::Status;
        =7=     use URI::Find;
        =8=     
        =9=     ## config
        =10=    my $PORT = 42001;               # at what port
        =11=    my $TIMEOUT = 90;               # number of quiet seconds before abort
        =12=    my $CHAT_TIME_MAX = 300;        # how long to keep old scrollback
        =13=    my $CHAT_COUNT_MAX = 12;        # how many messages max
        =14=    my $NAME_MAX = 30;              # how long can a name be
        =15=    my $MESS_MAX = 120;             # how long can a message be
        =16=    ## end config
        =17=    
        =18=    my ($HOST) = $ENV{SERVER_NAME} =~ /(.*)/s; # untaint
        =19=    
        =20=    my $d = do {
        =21=      local($^W) = 0;
        =22=      new HTTP::Daemon (LocalAddr => $HOST, LocalPort => $PORT, Reuse => 1)
        =23=    };
        =24=    my $url = "http://$HOST:$PORT";
        =25=    
        =26=    print header;
        =27=    # durn - no shortcuts for this!  what was lincoln thinkin'? :)
        =28=    print <<END;
        =29=    <html><head><title>Chat with us!</title></head>
        =30=    <frameset rows="75%,25%">
        =31=    <frame src="$url/read10" name=read><frame src="$url/write" name=write>
        =32=    </frameset></html>
        =33=    END
        =34=      
        =35=    exit 0 unless defined $d;       # do we need to become the server?
        =36=    
        =37=    defined(my $pid = fork) or die "Cannot fork: $!";
        =38=    exit 0 if $pid;                 # I am the parent
        =39=    close(STDOUT);
        =40=    
        =41=    my @CHAT;
        =42=    {
        =43=      alarm($TIMEOUT);              # (re-)set the deadman timer
        =44=      my $c = $d->accept or redo;   # $c is a connection
        =45=      my $r = $c->get_request;      # $r is a request
        =46=      close $c, redo unless $r;     # not sure why I need this
        =47=    
        =48=      (my $code = $r->url->epath) =~ s{^/}{};
        =49=      $c->send_basic_header;
        =50=      $CGI::Q = new CGI $r->content;
        =51=    
        =52=      print $c header;              # start_html is inside switch
        =53=      if (my ($secs) = $code =~ /read(\d+)/) {
        =54=        print $c start_html(-head => ["<meta http-equiv=refresh content=$secs>"]);
        =55=        
        =56=        print $c h1("Chat responses"), "Change update to";
        =57=        print $c " ",a({-href => "$url/read$_"}, $_) for qw(1 2 5 10 15 30 60);
        =58=        print $c " seconds", br;
        =59=    
        =60=        shift @CHAT while @CHAT > $CHAT_COUNT_MAX or
        =61=          @CHAT and $CHAT[0][0] < time - $CHAT_TIME_MAX;
        =62=        print $c table( {-border => 0, -cellspacing => 0, -cellpadding => 2 },
        =63=                        map { Tr(td([substr(localtime($_->[0]),11,8).' from '.
        =64=                                     fix($_->[1]).':', fix($_->[2],1) ]))} @CHAT);
        =65=    
        =66=      } elsif ($code =~ /write/) {
        =67=        if (defined(my $name = param('name'))
        =68=            and defined(my $message = param('message'))) { # we have input!
        =69=          tr/\x00-\x1f//d for $name, $message; # remove nasties
        =70=          $name = substr($name, 0, $NAME_MAX) if length $name > $NAME_MAX;
        =71=          $message = substr($message, 0, $MESS_MAX) if length $message > $MESS_MAX;
        =72=          push @CHAT, [time, $name, $message] if length $name and length $message;
        =73=        }
        =74=    
        =75=        print $c start_html, h1("Chat write");
        =76=        print $c start_form(-action => "$url/write");
        =77=        print $c textfield("name","[I must change my name]", $NAME_MAX),
        =78=          submit("says:"), textfield("message", "", $MESS_MAX, $MESS_MAX, 1);
        =79=        print $c end_form;
        =80=      }
        =81=    
        =82=      print $c end_html;
        =83=    
        =84=      close $c;
        =85=      redo;
        =86=    }
        =87=    
        =88=    sub fix {                       # HTML escape, plus find URIs if $_[1]
        =89=      local $_ = shift; return escapeHTML($_) unless shift;
        =90=      # use \001 as "shift out", "shift in", presume data doesn't have \001
        =91=      find_uris($_, sub {my ($uri, $text) = @_;
        =92=                         qq{\1<a href="\1$uri\1" target=_blank>\1$text\1</a>\1} });
        =93=      s/\G(.*?)(?:\001(.*?)\001)?/escapeHTML($1).(defined $2 ? $2 : "")/eig;
        =94=      $_;
        =95=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.