Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Web Techniques Column 16 (August 1997)
I've been wanting to write about the save
method of the
all-singing, all-dancing, gotta-have-it CGI.pm
module for quite
some time, as a way of saving structured data into a flat textfile to
be processed later. Well, I finally stumbled on to a nice little idea
that works pretty well, and also provides yet another example of
flock()-ing a datafile and generating HTML on the fly.
The idea is not a new one... it's a ``web chat'' script. This is the kind of thing where you and others go to a particular URL at the same time, and you start typing your messages into a form field, press ``submit'', and then you get to see what the others just said at the same time as you. Kindof like the too-huge-for-its-own-good ``Internet Relay Chat'', but with a lot less bells and whistles. Or a really fast-moving guestbook that keeps only the 32 most recent entries.
So I decided to hack out a little under-100-line web chat script. No bells, no whistles, no frills. Stick it somewhere, and you can talk with a friend, or make friends.
Of course, writing a column like this particular one makes me a ``newbie magnet'', as in ``someone who is likely to get a lot of uninteresting questions from people that won't do research for themselves''. I can imagine the number of email requests I'll now be getting from people who are not actually programmers but think they would be a R3AL K00L D00D to have a chat area on their website. So, they copy scripts like mine into some likely (or unlikely!) webserver area, without even bothering to configure anything, and then write me when it breaks. (I get lots of email with a first line of ``why doesn't [this program] work?'' and then a spew of 50 to 500 lines of code... joy.)
So let me state this up front, as a paragraph that I can point them to
later: this script is not meant to be used as-is. In fact, it's
not meant to be used at all. It's merely an illustration of some
technology around the CGI.pm
module and saving and restoring
queries, and yet another demonstration of flock()-ing. The fact that
the application is a simple web-chat script that actually works
(for a very small narrow definition of ``works'') is irrelevant. OK,
end of disclaimer.
But, in any event, I hereby present my little toy web chat script in [Listing one, below].
Lines 1 and 2 begin nearly every lengthy program I write, enabling taint-checks, warnings, and appropriate compiler and run-time restrictions.
Line 4 pulls in the CGI.pm module, and defines the standard useful set of form-access methods and HTML-generation methods.
Lines 7 and 8 define constants, using the new constant
module.
This module is part of the 5.004 (and later, I presume) Perl
distributions, and was created by my associate Perl trainer, Tom
Phoenix (rootbeer@teleport.com
). However, if you don't have
constant
(or cannot get it for some ludicrous pointy-haired-manager
reason), you can replace those lines with something like:
sub CHATFILE { "/home/merlyn/Web/chatfile" } sub MAXENTRIES { 32 }
and it'll work approximately the same. These two constants define the location of the chat information, and the number of prior messages to retain.
Lines 10 through 14 define a subroutine to encode the required HTML entities into their HTML-safe counterparts. Quotes, less-than, greater-than, and ampersands are all handled nicely.
Line 16 prints an HTTP header, the beginning of the HTML page, and an
H1 header, using routines from the CGI module. Line 17 executes the
main
routine (defined later) in an eval block, protecting it from
any dangerous die
operations.
Should a die
occur, the $@ variable is set to the death message;
otherwise, the $@ is blank. Lines 18 through 21 detect the error,
sending out the error message (properly escaped using ent
defined
above).
Lines 23 through 38 define the main
routine. I did it this way so
that the eval block above is very small and easy to see. Of course, I
could have just put the entire definition for main
into the eval
block.
Line 24 fetches the prior chat entries from the datafile, including
updating the file with the submitted form if necessary. More on that
later when I discuss the get_old_entries
subroutine.
Lines 25 through 30 print the input form to be submitted. Line 25
takes care of the horizontal line (via hr
) and start-of-form
information. The form will be made a POST form that is
self-referential, by default, meaning that a submit button in this
form will cause this same script to be reinvoked.
Lines 26 through 28 create the three form fields: ``name'', ``email'' and ``message''. Note here that ``message'' has a default value of the empty string, but also has an override parameter (the final 1) set to true. This means that any prior value of ``message'' will be ignored, and the requested default (empty string) will have precedence. Because the other two fields do not have override set to true, any prior value for those fields will carry forward from one invocation to the next as a default.
Line 29 puts a submit button at the end of the form, along with a note about submitting an empty message to listen. Line 30 closes off the form.
Lines 31 through 36 display the prior messages, kept in the @entries
variable. The syntax here (with for my $var ...
) is new to Perl
version 5.004, so again if you don't have the latest Perl, you'll have
to make some slight adjustments. Each element of @entries goes into
the lexical local variable $entry, which is then examined in the body
of the loop.
Line 32 fetches the ``name'' field from a particular entry, and prints it. Similarly, line 33 handles the ``email'' field. Line 34 is a little strange, because as you'll see later, we're saving the current time of day as a Unix-timestamp value into the entries. Luckily, in one swift move, we can convert this to a human-readable string (using scalar localtime). Finally, line 35 takes care of the ``message'' parameter (what they actually ended up saying).
Line 37 closes out the output of the HTML page, and is the last output normally done.
Lines 40 through 74 define the subroutine that handles the interaction with the chat-file. This subroutine was called from above in line 24, and is expected to return a list of the current chat entries. Line 43 creates an empty array that we'll use as the return value.
Lines 44 and 45 set up a temporary filehandle using the IO::File
class. (Again, if your Perl version is not at least 5.004, you might
need to upgrade to use this particular part as-is.) The filehandle is
opened read/write (indicated by the ``+<'' opening mode). This
filehandle allows the program to access the history of messages
posted to this chat.
Line 47 ensures that only one invocation of this program at any
particular time is reading, modifying, or writing the chat history
file. The flock()
operator will block the program until we can get an
exclusive lock.
Now, from here on down to the point where we release the lock (line 71), we are the only script operating, so it's important to keep this amount of time short, especially on a busy system. I usually flag these moments with comments such as the ones on line 46 and 72, which tells me rather visually how many steps are being hacked during this time.
Line 48 rewinds the file, not completely necessary here, but mostly a safety precaution, because the next operation really wants to process the entire file. (I generally seek right after obtaining a flock, because the file size might have changed from the last time I looked.)
Lines 49 through 51 pull in all the historical chat messages. Each
time through the loop (as long as we haven't hit end-of-file, detected
with eof()), the CGI module's new
routine is called, passing it the
filehandle. This triggers the routine to read a standardized
save-and-restore form data format from the file, creating an
independent query record. The push()
takes this and shoves it onto
the ever-increasing @entries array. When we're done @entries is a
list of CGI ``objects'', each one containing a separate submitted chat
message, along with all of its identifying information.
Lines 52 and 53 check if this particular script invocation came from a
form submission containing a valid message to post, or just a message
consisting of whitespace (something to be ignored) or even absent
(such as the first time this script's URL gets called up). Note
the explicit check for defined(), and then a further check for that
defined element containing any non-whitespace character with /\S/
.
Lines 55 through 62 transfer the user's ``query'' as one of the posted messages. However, we must be careful what gets transferred across, to prevent resource-hogging from a malicious-and-slightly-clever user. So, I have to ensure that only the selected fields get added to the history file, and that those fields are limited in size.
Here, I've chosen to accept three user-returned fields (the same as in my generated form above) and limit those to 1024 characters each. By doing so, the worst that mad user can do is fill up each slot with about 10K each (1K times 3 items times 3 bytes per hex-escaped character plus a little overhead). Because we limit the posts to 32 slots, we're always gonna be under 320K for the filesize then -- not a big deal. Yes, there are other resource starvation issues, but at least filling up the disk is not going to be one of them.
To transfer just a limited about of information into the history file, I create a brand-new CGI object in line 55, empty except for a timestamp (using the Unix internal time value). Lines 56 to 61 then add the other three parameters from the user's input query, being careful to truncate the data to 1024 characters without prejudice.
Lines 62 through 64 put the user's query in front of the data (so new messages are automatically visible at the top) and then ensure that only the 32 most recent messages are saved into the file.
Lines 65 through 69 rewrite the output data, using the save
method.
This method causes the data to be scribbled out into the history file
in such a way that they can be loaded up by the code in line 50 on the
next script invocation. So we've essentially got a flat text file acting
as a structured data repository, thanks to the save/restore code built
in to CGI.pm
. Cool.
And there's not much left but to close the filehandle (line 71) and
return the entries (line 73). Actually, the filehandle would have
been automatically closed when the subroutine exited, because the
IO::File
reference is a lexically local variable. Sometimes, I
even therefore leave the close()
out.
So, to use this script, I'd plop it into a CGI directory somewhere, and create the file designated by CHATFILE, and make it writeable to the user-id of the CGI process. How you do that is pretty much site-dependant, so ask your webmaster. (If you are your webmaster and don't know, that's gonna be a tough one.) See you next time!
Listing One
=1= #!/home/merlyn/bin/perl -Tw =2= use strict; =3= =4= use CGI ":standard"; =5= =6= ## following must be writable by CGI user: =7= use constant CHATFILE => "/home/merlyn/Web/chatfile"; =8= use constant MAXENTRIES => 32; =9= =10= sub ent { # translate to entity =11= local $_ = shift; =12= s/["<&>"]/"&#".ord($&).";"/ge; # entity escape =13= $_; =14= } =15= =16= print header, start_html("Chat!"), h1("Chat!"); =17= eval { &main }; =18= if ($@) { =19= print hr, "ERROR: ", ent($@), hr; =20= exit 0; =21= } =22= =23= sub main { =24= my @entries = get_old_entries(); =25= print hr, start_form; =26= print p, "name: ", textfield("name","", 40); =27= print " email: ", textfield("email", "", 30), br; =28= print "message: ", textarea("message", "", 4, 40, 1); =29= print br, p, "(Submit an empty message to listen)", submit; =30= print end_form, hr; =31= for my $entry (@entries) { =32= print p(), ent($entry->param("name")); =33= print " (", ent($entry->param("email")), ") at "; =34= print ent(scalar localtime $entry->param("time")), " said: "; =35= print p(), ent($entry->param("message")); =36= } =37= print end_html; =38= } =39= =40= sub get_old_entries { =41= use IO::File; =42= =43= my @entries = (); =44= my $chatfh = new IO::File "+<".CHATFILE =45= or die "Cannot open ".CHATFILE.": $!"; =46= ## begin critical region (keep short) =47= flock $chatfh, 2; =48= seek $chatfh, 0, 0; =49= while (not eof $chatfh) { =50= push @entries, new CGI $chatfh; =51= } =52= my $message = param("message"); =53= if (defined $message and $message =~ /\S/) { =54= ## must transfer limited query to file =55= my $saver = new CGI {"time" => time}; =56= for (qw(name email message)) { =57= my $val = param($_); =58= $val = "" unless defined $val; =59= substr($val, 1024) = "" if length $val > 1024; =60= $saver->param($_, $val); =61= } =62= unshift @entries, $saver; =63= splice @entries, MAXENTRIES =64= if @entries > MAXENTRIES; =65= seek $chatfh, 0, 0; =66= truncate $chatfh, 0; =67= for my $entry (@entries) { =68= $entry->save($chatfh); =69= } =70= } =71= close $chatfh; =72= ## end critical region =73= @entries; =74= }