Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 19 (November 1997)

Most of the web is flashy, sizzly, a zillion bits of information coming at you in microseconds. Sometimes, you just want to let it drift by lazily. For example, I'm on the road a lot giving my Perl trainings at Fortune 100 companies all across the United States. Now, sometimes I get a little homesick, but luckily there are a few live web-cams set up that allow me to get a quick ``as it is right now'' snapshot of my hometown of Portland, Oregon.

One of these websites with live cameras is the local television station, KGW. They provide 8 webcams from around Oregon and Portland (including one very close to my house), and they're updated every two or three minutes. Often, I would keep a browser open on one of these views on the instructor workstation (visible only to me), reaching over to hit ``Reload'' as often as I wanted a new view.

Then I thought, ``hey, why should I have to hit Reload? I've got Perl!''

So I crafted up a CGI script to go out to KGW's website every 30 seconds, and if the picture was updated, would send a new picture to my browser. This script is interesting because it combines three things I haven't used before in one program: (1) multipart ``server push'' output, (2) CGI and LWP modules in the same program, and (3) the concept of an NPH script.

Now, I've looked at #1 and #2 individually in previous columns, but let's look at #3. An NPH script gets to talk directly to the browser, and as the output is generated, the browser sees the stop-and-start datastream. This is really handy when you have part of the output now and part of it later, like in a database search. In this case, by combining it with a multipart output, we can effectively ``animate'' the single image, replacing it as necessary with the updated webcam image.

And the program that does all this is in [Listing One, below].

Lines 1 through 3 should look familiar if you've been following this column for a while. They turn on taint checking (-T), warnings (-w), and all the compiler restrictions that make sense for any longer-than-ten-line program (use strict). That last line unbuffers STDOUT, very important for this program, because we'll be generating output in chunks, directly to the user's browser.

Lines 5 through 7 pull in the three standard modules this program needs: the LWP::UserAgent interface to allow us to fetch remote web pages, the CGI module to interpret the incoming CGI stuff and generate the resulting HTML, and finally, the HTTP::Date module (also found in LWP) to generate the date headers.

Lines 9 through 25 define some configuration constants. If you take this script, you'll probably want to change some of this stuff. But as always, these scripts are meant to be inspiration, not just ready-to-run items. (Besides, looking at Portland images probably won't do as much for you as it does for me.)

Line 11 defines the textstring that makes up the multipart output. Normally, you won't need to change this, but you'll certainly want it to be distinct from any possible sequence of characters in the JPEG files that are being transferred.

Line 12 defines how often (in seconds) this CGI script will go back to the original server to see if the picture has changed. You'll want this high enough to as to not unduly burden the source server, but not so high that you won't notice the picture update for a long time.

Line 13 is the common URL prefix for all the JPEG images for this particular script. I had to figure out this directory by looking at KGW's website -- in particular, where the inline and clickable images were coming from.

Lines 14 through 23 define an anonymous hash of the mapping between the JPEG file names and the text description. This hash creates the pop-up menu allowing the user to select which image is to be watched.

Line 27 creates a file-lexical variable $nph which will be true if the program name contains nph-. If you install this script properly, that will of course be true, but I was also testing an earlier version in non-NPH mode to track down some weirdnesses, and having both scripts linked together made the testing easier.

Line 29 grabs the CGI parameter image from the CGI interface into the $image variable. The || operator causes an undefined image parameter to become the empty string, so we won't get bad uses of undef later.

Line 30 begins the section of code used when no incoming form parameters have been submitted, such as when the script is invoked for the first time. The check here ensures that the only way we'll accept the image parameter is if it is a nice filename that doesn't start with a dot. If that's not true, we go ahead and print the original form using the following lines.

Lines 32 through 38 print the beginning HTTP header, including the HTTP status line if we're in NPH mode. Note that in four different ways, I'm telling the browser not to cache this script -- by setting the date and last-modified values to right now, turning on no-cache (using a directive compatible to both HTTP 1.0 and 1.1), and saying that it comes pre-expired (using an expire time of right now as well). For this program, forcing the output of the form to be non-cached was really unnecessary, but it was good practice for the output of the multipart stream later before I had gotten that part working.

Lines 39 through 43 print some HTML stuff ahead of the form, including a head, title, and start of the body, including an H1. A link to the original top-level page for KGW is also included. Obviously, if you steal this script for your own purposes and change the URLs, you'll want to put a similarly appropriate message here.

Lines 44 through 50 print the form in which the user can select a particular image. The pop-up menu generates a series of values that are labeled with the human-readable text, but cause the keys of the %$CAM_MAP hash to be sent instead. Note that line 44 prints a form that uses the GET method instead of the POST method (the default). I found this necessary in testing, because otherwise my Netscape browser would pull up an inappropriately cached image from one of the other cameras at the odd times. Ugh.

Line 51 bails out of the program if we've generated the input form, as the rest of this program is concerned with responding to a validly filled-out form.

And here's where it gets interesting. Lines 54 through 56 establish a ``user agent'' which is really just a virtual browser in a sense. Line 54 creates the object, line 55 sets its agent type (which will show up in the logs of the web server) and line 56 enables any proxying established by environment variables. (My ISP doesn't use an outbound proxy, but I always put this into the programs I write to permit maximum reuse of these scripts. You're welcome.)

Line 58 establishes a ``request'' object, containing all the useful information about a particular request. In this case, it'll be a GET against a URL built from the prefix as well as the $image parameter that had been passed in.

Line 60 defines a time value against which incremental fetches will be made. By starting this at zero, any image is newer than that, so we'll fetch the image and show it. As each image is fetched, the $basetime is updated with its Last-Modified value, so that we don't keep refetching the same picture repeatedly. More on this later.

Lines 62 through 69 print a HTTP header, similar to the one printed by lines 32 through 38. However, in this case, we need to define the output not as text/html, but as a multipart type. This allows the script to send a series of images in one response, rather than just one image. If the script is invoked in NPH mode (which it will need to be if we're doing this neat trick), then the right HTTP status line will also be sent.

Line 71 starts the actual HTTP body, which in this case must be a multipart/x-mixed-replace type. In particular, we need to delimit the parts with a boundary string, which line 71 prints.

Lines 72 through 93 form a (hopefully) infinite loop, repeatedly fetching the image and displaying it when it changes.

Line 73 forces an ``if-modified-since'' header to be added to our request, based on the time and date of the value of $basetime. Initially, this will be zero, sending an if-modified-since the epoch. As all files fit this category, the first transfer will effectively ignore this parameter. (A more sophisticated script might take the if-modified-since parameter from the user's browser and insert it here for the initial run. That'd be slick.)

Line 74 actually goes out to the server (here, KGW's webserver) to look for the updated JPEG. If the file has been updated since the if-modified-since time, then we get a success, and the response object contains the picture. If the file doesn't exist or hasn't been modified since the requested time, this is a very short response of some error code, and we skip down to the next section.

Lines 76 to 82 deal with a successful new image. The modification time of this new image is saved into $basetime, and then the parameters from the response become the headers for our passed-through output. Note that the use of headers_as_string makes this work rather easy. (Thank you, Gisle and Lincoln!) Line 80 sets up the multipart boundary so that the browser knows we're done with a particular chunk. Line 81 sends us away for a while, and line 82 sends the program back up to the top of the infinite loop.

Lines 84 and 85 detect the two valid reasons for this script to continue even though we don't have a valid image from the server. The most common cause is that the image has not changed since the last time we looked. For that, we get a 304 error. However, due to net congestion, we occasionally get a ``cannot connect'' error on one of the trials, so this script ignores that as well. I'm presuming here that it's an intermittent condition -- a more sophisticated script might check the number of consecutive bad times and bail out after a threshold is reached.

In either of these cases, lines 87 and 88 cause this script to park itself for $SLEEPTIME seconds and then start over again.

If the reason for failure was neither of the expected cases, lines 90 and 91 cause the script to exit with a bad message to STDERR, in the right format for the Apache error log file. Note that it includes the scriptname ($0) and the response code and message. This will almost certainly cause the browser to bomb as well -- probably indicated by a stop of the image update or a ``Document done'' condition.

And there you have it. Install this as something like /cgi-bin/nph-watcher, and then invoke it, select one of the images, and then just sit back and watch the Portland scenery go by every few minutes. That's one way for me to keep from getting homesick on the road. Enjoy!

Listing One

        =1=     #!/home/merlyn/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use LWP::UserAgent;
        =6=     use CGI qw(:standard);
        =7=     use HTTP::Date qw(time2str);
        =8=     
        =9=     ### constants
        =10=    
        =11=    my $BOUNDARY = "ThisRandomString";
        =12=    my $SLEEPTIME = 30;
        =13=    my $URL_PREFIX = "http://www.kgw.com/images/skycam";;
        =14=    my $CAM_MAP = {
        =15=                   'skycam-1-75.jpg' => 'Vancouver, USA [medium]',
        =16=                   'skycam-2-75.jpg' => 'Portland OMSI Tower [medium]',
        =17=                   'skycam-3-75.jpg' => 'Timberline/Mt. Hood [medium]',
        =18=                   'skycam-4-75.jpg' => 'Cedar Hills [medium]',
        =19=                   'skycam-5-75.jpg' => 'Portland International Airport [medium]',
        =20=                   'skycam-6-75.jpg' => 'Oregon Coast [medium]',
        =21=                   'skycam-8-75.jpg' => 'Salem State Capitol [medium]',
        =22=                   'skycam-9-75.jpg' => 'Newport, Oregon [medium]',
        =23=                  };
        =24=    
        =25=    ### end constants
        =26=    
        =27=    my $nph = $0 =~ /nph-/;
        =28=    
        =29=    my $image = param("image") || "";
        =30=    
        =31=    unless ($image =~ /^(?!\.)[-\w.]+$/) {
        =32=      print header(
        =33=                   -nph => $nph,
        =34=                   -date => time2str,
        =35=                   "Last-modified" => time2str,
        =36=                   -pragma => "no-cache",
        =37=                   -expires => '+0d',
        =38=                  );
        =39=      print start_html("Skycams!"), h1("Skycams!"), "\n";
        =40=      print p("Images are courtesy of",
        =41=              a({-HREF => "http://www.kgw.com/";},
        =42=                "KGW Television"), " -- check there for more info.",
        =43=             );
        =44=      print hr, start_form("GET");
        =45=      print p(submit("get this image:"),
        =46=              popup_menu(-name => "image",
        =47=                         "values" => [sort keys %$CAM_MAP],
        =48=                         -labels => $CAM_MAP,
        =49=                        ));
        =50=      print end_form, hr;
        =51=      exit;
        =52=    }
        =53=    
        =54=    my $AGENT = LWP::UserAgent->new;
        =55=    $AGENT->agent("watcher/0.5");
        =56=    $AGENT->env_proxy;
        =57=    
        =58=    my $REQUEST = HTTP::Request->new('GET', "$URL_PREFIX/$image");
        =59=    
        =60=    my $basetime = 0;
        =61=    
        =62=    print header(
        =63=                 -nph => $nph,
        =64=                 -date => time2str,
        =65=                 "Last-modified" => time2str,
        =66=                 -pragma => "no-cache",
        =67=                 -expires => '+0d',
        =68=                 -type => "multipart/x-mixed-replace;boundary=$BOUNDARY",
        =69=                );
        =70=    
        =71=    print "--$BOUNDARY\n";
        =72=    {
        =73=      $REQUEST->if_modified_since($basetime);
        =74=      my $response = $AGENT->request($REQUEST);
        =75=      if ($response->is_success) {
        =76=        $basetime = $response->last_modified;
        =77=        print $response->headers_as_string;
        =78=        print "\n";
        =79=        print $response->content;
        =80=        print "\n--$BOUNDARY\n";
        =81=        sleep $SLEEPTIME;
        =82=        redo;
        =83=      } elsif (
        =84=               $response->code == 304 # not changed yet
        =85=               or $response->code == 500 # cannot connect for some reason
        =86=              ) {
        =87=        sleep $SLEEPTIME;
        =88=        redo;
        =89=      } else {
        =90=        die "[" . localtime(time) . "]" .
        =91=          "$0 failure: %d %s\n", $response->code, $response->message;
        =92=      }
        =93=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.