Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Download this listing! | ||
Web Techniques Column 56 (Dec 2000)
[suggested title: Poor man's webchat]
Last month, I introduced the idea of a ``poor man's load balancer'', by producing a short program that would handle the basics, without necessarily getting into all the frills and whistles and gadgets of the full-blown solutions. This month, I decided to take on the ``web chat'' arena in the same way.
This came about in part because of a conversation we columnists had by email with the magazine staff, regarding this month's ``theme'' of ``collaboration''. Usually, I ignore the theme and write about whatever I want, but one of the suggestions was to write a ``web chat''.
By email, I mentioned that I could take on the chatter, and one of the other columnists said he had already written a 200-line chat server and 100-line chat client in Perl for another publication, and was happy not to redo that for this magazine. Another columnist mentioned that he had also done a webchat in some unspecified number of lines (probably far exceeding 300 total) in Java. Well, that gave me the challenge I needed: to write a webchat in under 100 lines. I succeeded.
Not only did I succeed, it turns out nice enough that a few of the
people I tested it with were clamoring for the source code, which I
present in [listing one below]. And as for features, well, there's no
JavaScript of any flavor, and it works with any browser that supports
frames and client-side pull (meta refresh). I even tested this with
the w3-mode
in GNU Emacs, and it works nearly fine, except that I
need to manually refresh the viewing window. Cool.
Not only that, but anything that looks like a URL in the chat text is
automatically linked to the real URL, including most of the forms
understood by those browsers-masquerading-as-newsreaders, thanks to
the URI::Find
module from the CPAN.
The chatserver is actually a mini-webserver. The first CGI hit after installation forks off a separate Perl process listening on its own port for HTTP requests. Browsers involved with chatting are actually getting data directly from this mini-webserver. Because a single process is involved, there's no problem ``sharing'' the data between the various chatting users. When the chat webserver becomes idle, it disappears, thus consuming no resources when not needed.
So, without further delay, let's look at this marvel. Line 1 turns on
taint checking and warnings. Line 2 disables the buffering on
STDOUT
. Line 3 turns on the usual compiler restrictions for
programs over 10 lines long (no softrefs, no barewords, and most
importantly, no undeclared variables).
Lines 4 through 7 bring in the modules. CGI
is standard with any
modern Perl distribution. HTTP::Daemon
and HTTP::Status
are
both part of the LWP distribution in the CPAN: a must-have if you're
doing any sort of web stuff. And URI::Find
is also found
(separately) in the CPAN.
Lines 9 through 16 define the configuration items for this program. While my programs are not generally meant as ``ready to run'', this program does indeed work out of the box, and the things I most wanted to tweak have been placed here in the front.
A port number must be designated in line 10. This port should be dedicated on the webserver to this chat program, so pick something high (between 40000 and 60000) and out of the way. Since only one program can be attached to a port at a time, this ensures easily that only one copy of the chatserver (a mini-web-server) will be running.
Line 11 defines the timeout (in seconds) of how long before the chatserver shuts down. In normal chatting, each browser will be issuing a client pull at least every 60 seconds from the server, so it'll stay alive. When everyone goes away, so does the server.
Line 12 defines the historical time for individual chat messages.
This is useful so that a screen isn't cluttered with very old
messages, so a quick glance at a chat window can see if there's any
recent activity. It can exceed the $TIMEOUT
time because any
active browser keeps the server alive, even if messages aren't being
sent.
Lline 13 defines the maximum number of lines in the chat viewing window. Too small, and an adequate conversation is impossible. Too large, and the viewing window starts scrolling. I tried putting the newest messages at the top so that even a scrolled window was OK, and it was too unnatural to read the messages from bottom to top to catch up. So, just keep the count low, and you won't scroll.
Line 14 defines the maximum length for an identifying name, while line 15 similarly defines the length of a message line. Again, there's a tradeoff. If you permit long lines, things will scroll more often, but people can get their message said in fewer lines more often.
Line 18 fetches our web host from a CGI environment variable provided by the web server.
Lines 20 through 23 attempt to set up a miniwebserver at the designated host and port. If it fails, we'll get a warning (hence the warnings turned off), and we know that an existing webserver is there ready to serve. If it succeeds, we must fork and become the web server, noted later. Line 24 computes the URL for this webserver, in either case.
Lines 26 through 33 generate the response to visiting the CGI script.
Line 26 in particular dumps an HTTP header, defining the rest of the
content as HTML. Lines 28 to 33 dump the required frameset, which
unfortunately cannot be generated using HTML shortcuts. In
particular, the start_html
shortcut generates BODY
tags, which
won't work with a frameset. So, manually, I type the first less-thans
and greater-thans in my CGI script in a very long time. It feels so
retro. The frameset is a 75/25 split, with the reading window on top,
and the writing window on the bottom. I don't actually refer to the
frame names, but put them in there for future reference.
The URL for the reading frame and writing frame are both pointing at
the mini-webserver. The read10
URL creates a default refresh rate
of once each 10 seconds, which can be changed later.
In line 35, we're done if we didn't manage to create the daemon socket, which means an existing server is out there ready to fulfill the frame stuff. However, if not, we must continue and become the server.
Line 37 forks us, which means we can now go in two different
directions at once. We have to do this because the webserver that
invoked us needs to see the original process exit before it releases
the browser. But we want to continue past that. Hence a fork. The
parent process exits in line 38, and the child must close STDOUT
in
line 39, and that satisfies the webserver.
So, from here on, we've got us a mini-webserver. One process for all
the browsers to talk with, listening at port $PORT
. Line 41
provides the @CHAT
data structure. New messages are put on at the
end, while old messages are shifted off the front.
Lines 42 to 86 provide the main execution loop. Line 43 is the
deadman timer. If we don't make it up here at least every $TIMEOUT
seconds, we're dead. Lines 44 to 46 get a valid web connection and
request from a browser into $r
. The ``not sure why I need this'' is
that I noticed in heavy use that I'd sometimes get undef
in either
$c
or $r
, and rather than diagnose it, I simply retry. Solved.
Line 48 extracts the path being requested from this webserver. That's
where we find out if it was a write or a read request. Line 49 sends
a basic HTTP status response header. Line 50 extracts the POST
content (even if it was a GET
, but we don't care much, to keep it
short).
Line 52 prints the HTTP header, including the text/html
content
type. Lines 53 through 65 handle the ``read'' window, while lines 66 to
80 handle the ``write'' window.
Line 54 displays the all-important meta-refresh request, asking for
the browser to refresh this frame (and this frame only) every $secs
seconds. Initially, that'll be 10, but it can be changed by hitting
one of the links generated in lines 56 to 58. The links also succeed
in forcing a manual refresh. Nice.
Lines 60 and 61 take care of tossing the old messages, whether there are too many, or they're too old.
Lines 62 through 64 do a massive transformation, from the N-by-3 matrix of old messages into a nice N-by-2 table for display. I'm not heavy into fancy HTML design, but the basic no-border look was nice for this project, as was the designated cellspacing and cellpadding.
The map
in the middle makes rows of each two-cell row. The first
cell is derived from the timestamp and the selected username. The
username is HTML-escaped to prevent scary people from including
less-thans and greater-thans and entities from hosing up the display.
The timestring is also substringed so that we get just the
24-hour-time including the seconds. And for the fun part, the message
is passed through the fixer with a flag that says that URIs should be
made linkable. More on that later.
If it's a write we've got, we may have a message to add to the queue. Lines 67 and 68 grab the CGI parameters from the form we're about to display, and if they're there, we rip out all the low control characters, and truncate them to a maximum acceptable length in lines 69 through 71. If there's anything left to them in line 72, they end up on the end of the message list, to be displayed into one or more browsers on the next read pull. Lines 75 to 79 dump out a simple form to grab the username and message. The username field is sticky, while the message field starts out empty each time, which seems to make a lot of sense here. Note that on some browsers, hitting return in either field is enough to send the form, while on others, we need to press the submit button.
Lines 82 and 84 end the page display (either for the read window or the write window), and close the connection, so the browser knows we're done. Line 85 starts the loop over again.
Lines 88 to 95 fix up a string for either HTML entity escaping, and
possibly URL linking. Line 89 performs a simple HTML-escaping and
returns if that's all that's desired. Lines 91 and 92 call the
find_uris
subroutine (from URI::Find
), marking each URI with
control-A (\001
) marks around the parts that must be protected from
HTML entity fixups. Line 93 carefully HTML entitizes the rest of the
string, skipping over any of the marked parts. It took me a while to
get this right. Note that the links are also marked to come up into a
separate browser window, because we don't want them in place of the
read window: we'll never get back correctly. Line 94 returns the
patched-up string.
And there you have it. Just 5 lines short of a 100, thus winning the
satisfaction that I could do what I set out to do. In those 5 more
lines, I might have added email address finding (using Email::Find
from the CPAN) or perhaps controlled the number of scrollback lines
displayed (by using a URL like /read10.15
to encode both numbers).
Or perhaps put some slight amount of security to have the usernames be
registered with an email registration system, or even at least display
the source internet addresses. All good suggestions that came from my
beta testing period.
But, we'll leave that for another day. If you write the code, let me know. After all, it's all about collaboration. Until next time, enjoy!
Listings
=1= #!/usr/bin/perl -Tw =2= $|++; =3= use strict; =4= use CGI qw(:standard escapeHTML); =5= use HTTP::Daemon; =6= use HTTP::Status; =7= use URI::Find; =8= =9= ## config =10= my $PORT = 42001; # at what port =11= my $TIMEOUT = 90; # number of quiet seconds before abort =12= my $CHAT_TIME_MAX = 300; # how long to keep old scrollback =13= my $CHAT_COUNT_MAX = 12; # how many messages max =14= my $NAME_MAX = 30; # how long can a name be =15= my $MESS_MAX = 120; # how long can a message be =16= ## end config =17= =18= my ($HOST) = $ENV{SERVER_NAME} =~ /(.*)/s; # untaint =19= =20= my $d = do { =21= local($^W) = 0; =22= new HTTP::Daemon (LocalAddr => $HOST, LocalPort => $PORT, Reuse => 1) =23= }; =24= my $url = "http://$HOST:$PORT"; =25= =26= print header; =27= # durn - no shortcuts for this! what was lincoln thinkin'? :) =28= print <<END; =29= <html><head><title>Chat with us!</title></head> =30= <frameset rows="75%,25%"> =31= <frame src="$url/read10" name=read><frame src="$url/write" name=write> =32= </frameset></html> =33= END =34= =35= exit 0 unless defined $d; # do we need to become the server? =36= =37= defined(my $pid = fork) or die "Cannot fork: $!"; =38= exit 0 if $pid; # I am the parent =39= close(STDOUT); =40= =41= my @CHAT; =42= { =43= alarm($TIMEOUT); # (re-)set the deadman timer =44= my $c = $d->accept or redo; # $c is a connection =45= my $r = $c->get_request; # $r is a request =46= close $c, redo unless $r; # not sure why I need this =47= =48= (my $code = $r->url->epath) =~ s{^/}{}; =49= $c->send_basic_header; =50= $CGI::Q = new CGI $r->content; =51= =52= print $c header; # start_html is inside switch =53= if (my ($secs) = $code =~ /read(\d+)/) { =54= print $c start_html(-head => ["<meta http-equiv=refresh content=$secs>"]); =55= =56= print $c h1("Chat responses"), "Change update to"; =57= print $c " ",a({-href => "$url/read$_"}, $_) for qw(1 2 5 10 15 30 60); =58= print $c " seconds", br; =59= =60= shift @CHAT while @CHAT > $CHAT_COUNT_MAX or =61= @CHAT and $CHAT[0][0] < time - $CHAT_TIME_MAX; =62= print $c table( {-border => 0, -cellspacing => 0, -cellpadding => 2 }, =63= map { Tr(td([substr(localtime($_->[0]),11,8).' from '. =64= fix($_->[1]).':', fix($_->[2],1) ]))} @CHAT); =65= =66= } elsif ($code =~ /write/) { =67= if (defined(my $name = param('name')) =68= and defined(my $message = param('message'))) { # we have input! =69= tr/\x00-\x1f//d for $name, $message; # remove nasties =70= $name = substr($name, 0, $NAME_MAX) if length $name > $NAME_MAX; =71= $message = substr($message, 0, $MESS_MAX) if length $message > $MESS_MAX; =72= push @CHAT, [time, $name, $message] if length $name and length $message; =73= } =74= =75= print $c start_html, h1("Chat write"); =76= print $c start_form(-action => "$url/write"); =77= print $c textfield("name","[I must change my name]", $NAME_MAX), =78= submit("says:"), textfield("message", "", $MESS_MAX, $MESS_MAX, 1); =79= print $c end_form; =80= } =81= =82= print $c end_html; =83= =84= close $c; =85= redo; =86= } =87= =88= sub fix { # HTML escape, plus find URIs if $_[1] =89= local $_ = shift; return escapeHTML($_) unless shift; =90= # use \001 as "shift out", "shift in", presume data doesn't have \001 =91= find_uris($_, sub {my ($uri, $text) = @_; =92= qq{\1<a href="\1$uri\1" target=_blank>\1$text\1</a>\1} }); =93= s/\G(.*?)(?:\001(.*?)\001)?/escapeHTML($1).(defined $2 ? $2 : "")/eig; =94= $_; =95= }