Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 66 (Oct 2001)

[suggested title: Rendering a calendar to HTML]

I have this nice little text-file link from my home page, which shows the places my crazy conference and training schedule takes me. Each entry is given on a single line showing a date range in day-month-year format, omitting the common parts of the range, followed by a short phrase, and possibly a partial URL for further information. So a typical few entries might look like:

  4 to 8 sep 01: San Francisco, CA for Web 2001 International
  11 to 19 jan 02: in the Southern Caribbean, teaching Perl on Perl Whirl 2 (www.geekcruises.com)

It's actually a link to my .plan file, which I've maintained faithfully so that the finger command can give back some information about my whereabouts, except that I haven't been on a system where finger has worked for about three years. Good thing I've put this as a link from my web page.

I've been meaning for a long time to get these public schedule items out of a flat file and into a real database, so that I could prop up a calendar link on my page, and have them shown as a nice HTML table calendar. But the convenience of just editing the flat file and the lack of round tuits prevented me from getting further along.

However, it occurred to me that I didn't need to work at it the hard way. I could keep my flat-file data source, and merely interpret the data as it already was. I whipped out my friendly Date::Manip (from the CPAN) documentation, and figured out what format the dates would have to be to compute date ranges, then mucked around a bit with a regular expression or two to pick up the date range pieces and feed them to the Date::Manip routines for canonicalization and computation. Within a short time, I had starting spitting out the text of each activity, preceded by every date involved with that item. Cool.

But next I had to render it in a nice HTML table. Bleh. I hated the thought of even more date calculations and HTML wrangling, although Date::Manip really can do just about everything. Luckily, somewhere back in the recesses of my mind, I had recalled stuffing away a note to look at HTML::CalendarMonthSimple, and sure enough, this was exactly the ticket.

But after hooking together the date-extraction logic with the HTML rendering logic, I was horrified to find that my poor little ISP's shared web server was getting nailed by each hit as I would tweak the HTML color settings and hitting reload. I quickly instrumented the potential CPU suckers by inserting a poor-man's profiler:

        my @times = (time, @times);
        ... do calculation here
        @times = map { $_ - shift @times } time, times;
        print "times: @times\n";

This quickly shows the wall-clock seconds, user and system CPU seconds, and user and system child CPU seconds (usually none). As I suspected, the Date::Manip and parsing was taking nearly 10 CPU (and wall-clock) seconds, but the HTML rendering was taking only 1/100th of that.

I was disheartened. But after a brief period of reflection, I observed that the analysis really only needed to be done once every time I edited my dot-plan file, which was only once every few weeks. I merely had to cache the results, and make the cache valid as long as it was newer than the modification time of the same file.

In the past, I've used File::Cache to perform this caching, but the Cache::Cache family of modules (by the same author) has now matured to the point of being useful, so I chose the newer interface.

Once I had completed the caching code, everything worked great! I spent a brief time customizing the colors and look, and then added ``previous month'' and ``next month'' links at the title.

I also remembered those little URLs in some of the messages, and fetched some code to recognize and turn those URLs into actual links to the sites. And the result is in [listing one, below]

Lines 1 through 3 start nearly every CGI program I write, turning on taint checking, warnings, all the compiler restrictions, and disabling the buffering on STDOUT. Turning on warnings turned out to be harder than I thought, as I'll discuss when I get to the end of the program.

Lines 5 through 8 bring in the expected modules. Of these, only CGI comes with Perl. The remainder are found in the CPAN: check perldoc perlmodinstall for details on how to add these to your installation or local directory.

Line 10 is the only configuration constant: the location of the text file containing my calendar. Lines in this file that match patterns of interest will end up on the calendar, and everything else will be ignored.

Lines 12 to 21 figure out which month we're displaying. Line 14 grabs the current local time information. Lines 15 to 17 get the month from the month parameter, presuming it's in range and provided. If not, we quietly fall back to the current month. Similarly, lines 18 to 20 grab the appropriate year value.

Lines 23 and 24 set up the cache connection. The namespace and the username ensure that I get consistent cache access whether I'm running this as myself or as the web user.

Line 26 holds the hash of events. Actually, it's a hash of years, with subhashes of months, with arrays of arrays representing tuples of day-eventstring pairs, so the first two day of the second event above would be represented as:

  $events{"2002"}{"1"} = [
    [11, "in the Southern ..."],
    [12, "in the Southern ..."],
  ];

I found this format to be the most natural in determining all of the events for a given month.

Line 28 creates a fingerprint for the current event file. We'll note the files device number, inode number, and last modified timestamp. If a new file is renamed into this position, or if the file is edited in any way, it'll have a different fingerprint, and we'll reprocess it.

Lines 30 to 36 fetch any existing cache value, calling the get request to the cache mechanism. The first item of this cache is a hashref of the previous value for %events. The remaining three items are the identity values that were used to generate that cache, which we compare in line 32 to our new fingerprint. If they're the same, we've got a valid cache, and we can use what we've seen.

But if there's no events, we presume we didn't have a good cache, and move on to parse the current file, starting in line 38. Line 40 brings in the slow-but-powerful Date::Manip package (found in the CPAN). The author admits that this module is slow, including even loading the module. Thus, I'm not even loading it into the program unless I need it. Between the require and the call to import immediately following, I have the equivalent of use Date::Manip, but performed at runtime on demand instead of compile time. The call to zero-out the path is required because I'm running tainted and for some reason the module wants to call a child process occasionally.

Lines 43 and 44 start the processing of the input file, using the @ARGV array and diamond-filehandle processing.

Lines 45 through 49 extract the date ranges and canonize the resulting start and endpoints. Because I use abbreviated ranges in my file, I needed to parse many variations, such as:

  1 to 11 nov 01: fred
  27 oct to 3 nov 01: barney
  28 dec 01 to 02 jan 02: dino

Here, fred is contained within one month, but barney spans a month boundary, and dino even spans a year boundary. Through careful consistency in placing parentheses, the identity of the memory variables remains the same, however, so we end up with values like 02 jan 02 in both $start and $end in line 49 by replicating the parts that are missing.

Lines 50 through 54 compute every day that belongs to this range, first by adding one day to the end of the range, then by generating a recurring value that is true for every day starting at the start date and ending before the incremented end date. For each of thses items, we add to the event list under the right month subhash.

Line 56 shoves the newly computed event items out to the cache, along with the signature of the data that generated this event list.

So now we have a nice event list, possibly from the cache, or possibly the cache has been updated. Time to render it, starting in line 59.

Lines 59 to 66 create the basic calendar structure, including setting up some of the appearance items. I'm not a graphics designer, so I just threw in what I thought was minimally needed to have it render consistently on different default settings.

Lines 68 to 81 handle the forward/backward links, which I've placed into the title near the month name. Line 69 grabs the URL which will reinvoke this program. Lines 70 and 71 compute the prior month as a link, reinvoking this script with appropriate year/month parameters. Similarly, lines 72 and 73 compute the next month link.

Lines 74 to 81 adjust the calendar's header so that it's a three-cell table. The center cell is the month name, and the left and right cells are the previous and next month links.

Line 84 begins the output of the program, including titling the page with the computed month/year. Rendering begins in line 86 where we pull out the array of arrayrefs for just those items for the current month. Line 87 extracts the specific day and text string for that event.

Lines 88 to 92 alter the text by looking for all URL-like strings, and replacing them with links to the actual pages. I stole this code from the work I did for my ``poor man's webchat'' [column 56], which I then also re-used in my web-to-news gateway [col 62]. Hmm, maybe I should submit this code to be included with the next release of URI::Find. Finally, the possibly-modified text is added to the calendar cell in line 93.

Line 96 extracts the HTML for this calendar. Unfortunately, it seems to have lots of ``undef used as a string'' warnings, and rather than track them all down, I just turned warnings off during this step. Line 98 finishes up the HTML page, and we're done.

So, now I have a fancy GUI interface to my calendar, including the added bonus of looking up the URLs I've included as annotations. One possible addition is to color-code the range of events, perhaps by geographical location or simply to distinguish events. But that's for another day. Until then, enjoy!

Listings

        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use CGI qw(:all);
        =6=     use HTML::CalendarMonthSimple;
        =7=     use Cache::FileCache;
        =8=     use URI::Find;
        =9=     
        =10=    my $PLANFILE = "/home/merlyn/.plan";
        =11=    
        =12=    my ($formonth, $foryear);
        =13=    {
        =14=      my @NOW = localtime;
        =15=      $formonth = param('month');
        =16=      $formonth = $NOW[4]+1  unless defined $formonth and $formonth !~ /\D/ and
        =17=        $formonth >= 1 and $formonth <= 12;
        =18=      $foryear = param('year');
        =19=      $foryear = $NOW[5]+1900 unless defined $foryear and $foryear !~ /\D/ and
        =20=        $foryear >= 2001 and $foryear <= 2005;
        =21=    }
        =22=    
        =23=    my $cache = Cache::FileCache->new({namespace => 'whereami',
        =24=                                       username => 'nobody'});
        =25=    
        =26=    my %events;
        =27=    
        =28=    my @nowidentity = (stat($PLANFILE))[0,1,9];
        =29=    
        =30=    if (my $cached = $cache->get('data')) {
        =31=      my ($events, @identity) = @$cached;
        =32=      if ("@nowidentity" eq "@identity") {
        =33=        ## we have a valid cache
        =34=        %events = %$events;
        =35=      }
        =36=    }
        =37=    
        =38=    unless (%events) {
        =39=      ## no cache, so compute from scratch
        =40=      require Date::Manip; local $ENV{PATH} = "";
        =41=      Date::Manip->import;
        =42=    
        =43=      @ARGV = $PLANFILE;
        =44=      while (<>) {
        =45=        next unless
        =46=          /^(\d+)\s+to\s+(\d+)(\s+\S+\s+\d+):\s+(.*)/ or
        =47=            /^(\d+\s+\S+)\s+to\s+(\d+\s+\S+)(\s+\d+):\s+(.*)/ or
        =48=              /^(\d+\s+\S+\s+\d+)\s+to\s+(\d+\s+\S+\s+\d+)():\s+(.*)/;
        =49=        my ($start, $end, $where) = ("$1$3","$2$3", $4);
        =50=        $end = DateCalc($end, "+ 1 day");
        =51=        for (ParseRecur("every day", undef, $start, $end)) {
        =52=          my ($y, $m, $d) = UnixDate($_, "%Y", "%m", "%d");
        =53=          push @{$events{0+$y}{0+$m}}, [$d, $where];
        =54=        }
        =55=      }
        =56=      $cache->set('data', [\%events, @nowidentity]);
        =57=    }
        =58=    
        =59=    my $cal = HTML::CalendarMonthSimple->new(year => $foryear, month => $formonth);
        =60=    $cal->width('100%');
        =61=    $cal->bgcolor('white');
        =62=    $cal->todaycolor('grey');
        =63=    $cal->bordercolor('black');
        =64=    $cal->contentcolor('black');
        =65=    $cal->todaycontentcolor('black');
        =66=    $cal->headercolor('#ccffcc');
        =67=    
        =68=    {
        =69=      my $myself = url(-relative => 1);
        =70=      my $previous = sprintf "%s?year=%d&month=%d", $myself,
        =71=        $formonth == 1 ? ($foryear - 1, 12) : ($foryear, $formonth - 1);
        =72=      my $next = sprintf "%s?year=%d&month=%d", $myself,
        =73=        $formonth == 12 ? ($foryear + 1, 1) : ($foryear, $formonth + 1);
        =74=      $cal->header(table({width => '100%', border => 0,
        =75=                          cellspacing => 0, cellpadding => 2},
        =76=                         Tr(td({align => 'left', width => '1*'},
        =77=                               a({href => $previous}, "previous")),
        =78=                            td({align => 'center', width => '1*'},
        =79=                               b($cal->monthname, $cal->year)),
        =80=                            td({align => 'right', width => '1*'},
        =81=                               a({href => $next}, "next")))));
        =82=    }
        =83=    
        =84=    print header, start_html("My Calendar for ".$cal->monthname." ".$cal->year);
        =85=    
        =86=    for (@{$events{0+$foryear}{0+$formonth}}) {
        =87=      my ($d, $where) = @$_;
        =88=      for ($where) {
        =89=        find_uris($_, sub {my ($uri, $text) = @_;
        =90=                           qq{\1<a href="\1$uri\1" target=_blank>\1$text\1</a>\1} });
        =91=        s/\G(.*?)(?:\001(.*?)\001)?/escapeHTML($1).(defined $2 ? $2 : "")/eig;
        =92=      }
        =93=      $cal->addcontent(0+$d, $where);
        =94=    }
        =95=    
        =96=    { local $^W = 0; print $cal->as_HTML; }
        =97=    
        =98=    print end_html;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.