Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 33 (Jan 1999)

One of the side effects of buying my first still camera in nearly two decades is that I started taking pictures of everything. One of the nice things about buying a digital camera is that I have the results immediately. The downside of this is that I take a lot of pictures and want to show them off, because I don't have to worry about the cost or time of having some silver-halide-coated emulsion turned into light and dark spots, and then having friends ask for a copy.

In my September 1998 column, I described a little tool to create thumbnails from a directory full of JPEGs. In the past few weeks, I've used this program to create thumbnails of the pictures I'm taking with my new camera, but now I have hundreds of pictures in a given directory. People started complaining about how long it took to download even the thumbnails, so I knew I needed another solution.

Luckily, I remembered that the number one server, Apache (which happens to be on my ISP) has the ability to have not only an HTML file for the index for a directory, but also a CGI script to do the indexing. This is triggered with either an .htaccess file or the configuration files containing a line like:

    DirectoryIndex /cgi/pictures

So, I put that line into the .htaccess for my Pictures directory, and then that directory and all subdirectories now invoke the named CGI script whenever the directory itself is asked for. Note that this does not invoke the CGI script for the files within the directory.

Next, I needed a way to pass parameters into this script to give the directories meaningful titles. I decided to place that information in a file called .title in each directory (yes, beginning with a period) that has one line of text like this:

    Pictures around the house

And then I needed a place to put the descriptions. I decided to use a similar file named .info in each directory. The format is the filename followed by some whitespace and then some HTML-encoded text. Long lines are wrapped onto successive lines by indenting the following lines. Lines that begin with # are ignored. So that looks like this:

    . These are pictures around the house.  I started taking them
      when I first got my camera, and I've been adding to them ever since.
      Please check back frequently for more pictures.
    FrontDoor.jpg This is the front door. The front door is <b>locked!</b>
    BackDoor.jpg This is the backdoor.

Note the long text description for the current directory is given as a special entry of ..

Finally, I wrote the program for /cgi/pictures, as given in [listing one, below].

Lines 1 through 5 start nearly every CGI program I write. This enables taint checking and warnings, enforces additional compile-time and run-time restrictions, reset any die messages to the browser, and pulled in the standard shortcuts and processing for the CGI information.

Line 7 defines the path to the top of my web server's document tree. In other words, this is the directory that corresponds to a / in my URLs. There's no way to get this information directly from a CGI script, so we're stuck with having to hardcode it.

Line 8 is a configuration constant. We'll show up to 10 pictures per page before making them press a button.

Lines 9 and 10 define the web colors for alternating lines of the tables being generated later. I picked a white and an offwhite color.

Lines 12 through 17 attempt to change the current directory to the directory being indexed. The name comes from concatenating the value of the SCRIPT_URL environment variable with the top of the document tree. No, it doesn't make any sense to me either that the pathname of the directory to be indexed is in a variable with URL in the name, but that's the way it is. The Apache documentation is wonderfully vague at this point, so I had to figure this out by trial and error.

Lines 19 through 25 print the top of the resulting page. We fetch the title of the page from the .title file via the &get_title subroutine, then generate a title page and a first-level heading based on that information. There's also a correct DTD in the header so that I can validate this page using validator.w3.org's service. The call to get the info for the opening paragraph triggers a parsing of the .info file as well.

Lines 27 through 31 fetch all the files in the current directory, using a temporary directory handle and a readdir call. These files are unsorted, and include all the files that start with period, which we'll filter out later.

Lines 33 and 34 establish two arrays: one for all the entries that are pictures with good thumbnails (@pics), and one with everything else in the directory (@other). Each element of these array is a listref, pointing to either a two- or three- element anonymous list (created later).

Lines 36 through 47 walk through the directory picking out the interesting parts, sorting them into the appropriate bin.

The list of elements to walk through is selected in line 36. We're tossing all flies that begin with dot or end with tilde (editor backup files), and adding back to the list the parent directory (which got tossed because it begins with a dot). That way we've automatically got an ``up'' link to get us out of here.

Lines 37 through 40 handle each directory of the list. For each directory, we add an entry to the end of the @other array pointing to a two-element list. The first element is the link URL (ending in slash so we can cleanly distinguish it as such) and the second element is the description. The description comes from the directory's .title file as before. That way we'll have a nice ``where to from here'' listing.

Lines 41 through 44 handle images for which we have a good thumbnail. If we've got one of those, we save a reference to a new three-element list onto @pics. The first element is the filename, the second gives the path to the thumbnail, and the third is the description (from the .info file).

Line 45 rejects all the thumbnail pictures so that we don't process them any further.

Line 46 handles any unknown files in the current directory. If they're not another directory, a picture with a thumbnail, or a thumbnail itself, then we'll just make a link to it with unexpected file for the description. (I suppose I could have made it try the .info file, but these links are supposed to be the exceptions, not the rule.)

Lines 49 through 60 print the portion of the output dedicated to links. This includes all the subdirectories as well as the parent directory. This section of code is inside a ``naked block'' so that we can get local variables with temporary duration and scope.

Line 50 creates the second-level heading for this section.

Line 52 sets up a toggling variable that lets us make the table have alternating darker rows. $flip will alternate between zero and one for each row, and is initially zero.

Lines 53 through 59 print a single table, using the CGI.pm's table shortcut method. The table-wide parameters of CELLSPACING and CELLPADDING are defined using a hashref with the appropriate keys and values as the first parameter to the table constructor.

Lines 54 through 60 provide the remaining elements of this table constructor as the result of the map operation. For each element of the @other array, the block of code from lines 55 to 58 will be executed with $_ set to that value. Since $_ is in turn an array ref, we can get to the elements and name them with @$_, in line 55. This will be the path to the file, and the text description.

The alternating colors of the table row are establish with an anonymous hash as the first parameter to Tr, setting the BGCOLOR parameter to either $ODD or $EVEN depending on the flipped value of $flip.

Next, line 57 constructs a table data (TD) cell consisting of an anchor referencing the link, with the linkname itself as the text. Line 58 follows that with another cell consisting of the description.

Lines 62 through 94 create the link table for all of the pictures, including the thumbnails. We only do this if there are pictures; hence the check for the number of elements of @pics to be non-zero.

Lines 63 to 69 determine appropriate values for $start, $low, and $high, based on the input parameter start (if present) and $PICSPERPAGE and the number of pictures to show. I stole some of this code almost directly from the text for the May 1996 edition of this column, so I won't describe this in detail. But basically, we'll end up showing the pictures from $low to $high.

And line 71 lets the websurfer know the range. Since all the numbers are zero-based from Perl's view, we have to add 1 to them to tell the human what the real numbers are.

Line 73 creates the $flip variable, just like for the table above.

Lines 74 through 93 dump out a table, using the table shortcut. The cell spacing and padding are set as before, in line 74 via an anonymous hash as a first parameter to table.

Lines 75 through 79 create a link to view earlier pictures. This link needs to be there only when we have earlier pictures to show (not initially, for example). The conditional ?: operator is used here to select either a table row (lines 76 to 78) or an empty list. The empty list will disappear out of the final list being given to table, so there's no harm (and no output).

If we need to include the link to earlier pictures, a single table data cell spanning three columns gets created. The text for the link is always preview earlier pictures. The link address is the URL of the directory being indexed, suffixed with a GET parameter of start=nn, where nn will be the new starting number. Again, it doesn't make sense to me that the directory being indexed has its full URL in something called SCRIPT_URI, but that's what I found in experimenting with this method.

Lines 80 through 87 create the table rows for the pictures themselves. Each element of the map processing is a listref, expanded into named variables in line 81. The row consists of three cells: the picture as an image and a link in the first cell (lines 83 to 85), the size in K-bytes (line 86), and the text description (line 87).

Lines 88 to 92 handle the link to the later pictures in a manner similar to the link for the earlier pictures.

Finally, line 96 closes the HTML output. This is the last executable line of code; everything else is just utilty subroutines.

Lines 98 through 102 define a subroutine to perform a URI escape for unsafe filenames, translating all characters that don't match the character class in line 100 to their percent-prefixed-hex equivalent. You could also use the URI::Escape module in the LWP library in a similar manner.

Lines 104 through 111 look for a .title file for a given directory. If the file exists, the first line of its contents are returned. Otherwise, we do a little heuristic for some common missing cases, and just punt to whatever got handed in otherwise.

Lines 113 to 128 define the &get_info subroutine, along with a local static variable %info. This routine parses the .info file in the current directory, and maintains a cache so that repeated calls will not have to reparse the file. This is wrapped in a BEGIN block so that the variable %info gets properly scoped and closure-ized with the &get_info routine, and we won't have to worry about the block accidentally being executed again.

Line 119 opens the file and reads it into $_. If that works, the comments are tossed in line 120; lines that begin with whitespace are folded up to the previous line, and then we create the %info hash from the key-value pairs extracted in line 122.

Line 124 defaults the information for the current directory to a single space. This ensures that there's always at least one entry for the hash as well, so that the next check at line 117 will always return true.

Line 126 returns the information for the passed-in parameter, defaulting to the somewhat uninformative non-message message if not found.

And there you have it. If you want to see this online, give it a view for my ever increasing picture gallery at http://www.stonehenge.com/merlyn/Pictures/. Enjoy.

Listing

        =1=     #!/home/merlyn/bin/perl -Tw
        =2=     use strict;
        =3=     use CGI::Carp "fatalsToBrowser";
        =4=     
        =5=     use CGI ":standard";
        =6=     
        =7=     my $HTDOC = "/home/merlyn/Html";
        =8=     my $PICSPERPAGE = 10;
        =9=     my $ODD = "#dddddd";            # bgcolor for odd rows
        =10=    my $EVEN = "#ffffff";           # bgcolor for even rows
        =11=    
        =12=    {
        =13=      my $dir;
        =14=      ($ENV{"SCRIPT_URL"} || "") =~ /(.*)/s and
        =15=        $dir = "$HTDOC$1" and
        =16=          chdir $dir or die "cannot chdir $dir: $!";
        =17=    }
        =18=    
        =19=    my $title = "Picture index for ".get_title(".");
        =20=    print
        =21=      header,
        =22=      start_html("-title" => $title,
        =23=                 -dtd => "-//W3C//DTD HTML 4.0 Transitional//EN"),
        =24=      h1($title),
        =25=      p(get_info("."));
        =26=    
        =27=    my @files = do {
        =28=      local *DIR;
        =29=      opendir DIR, "." or die "cannot readdir '.': $!";
        =30=      readdir DIR;
        =31=    };
        =32=    
        =33=    my @other;
        =34=    my @pics;
        =35=    
        =36=    for (sort "..", grep !/^\.|~$/, @files) {
        =37=      if (-d) {
        =38=        push @other, ["$_/", get_title($_)];
        =39=        next;
        =40=      }
        =41=      if (/\.(gif|jpg)$/ and -r "$_.thumb.jpg") {
        =42=        push @pics, [$_, "$_.thumb.jpg", get_info($_)];
        =43=        next;
        =44=      }
        =45=      next if /\.thumb\.jpg$/;
        =46=      push @other, [$_, "unexpected file"];
        =47=    }
        =48=    
        =49=    {
        =50=      print h2("Links");
        =51=    
        =52=      my $flip = 0;
        =53=      print table({cellspacing => 0, cellpadding => 10},
        =54=                  (map {
        =55=                    my ($path,$desc) = @$_;
        =56=                    Tr({bgcolor => (($flip = 1 - $flip) ? $ODD : $EVEN)},
        =57=                       td(a({href => my_uri_escape($path)}, $path)),
        =58=                       td($desc)) }
        =59=                   @other));
        =60=    }
        =61=    
        =62=    if (@pics) {
        =63=      my $start = 0;
        =64=      $start = $1 if (param("start") || "") =~ /(-?\d+)/;
        =65=      $start = 0 if $start < 0;
        =66=      my $low = $start;
        =67=      $low = $#pics if $low > $#pics;
        =68=      my $high = $low + $PICSPERPAGE - 1;
        =69=      $high = $#pics if $high > $#pics;
        =70=    
        =71=      print h2("Pictures",$low+1,"through",$high+1,"of",$#pics+1, "total");
        =72=    
        =73=      my $flip = 0;
        =74=      print table({cellspacing => 0, cellpadding => 10},
        =75=                  ($low > 0 ?
        =76=                   Tr(td({colspan => 3},
        =77=                         a({href => "$ENV{SCRIPT_URI}?start=".($low - $PICSPERPAGE)},
        =78=                            "preview earlier pictures"))) :
        =79=                   ()),
        =80=                  (map {
        =81=                    my ($path,$paththumb,$desc) = @$_;
        =82=                    Tr({bgcolor => (($flip = 1 - $flip) ? $ODD : $EVEN)},
        =83=                       td(a({href => my_uri_escape($path)},
        =84=                            img({src => my_uri_escape($paththumb),
        =85=                                 alt => "[thumbnail for $path]"}))),
        =86=                       td(int((1023 + -s $path)/1024)."K"),
        =87=                       td($desc)) } @pics[$low..$high]),
        =88=                  ($high < $#pics ?
        =89=                   Tr(td({colspan => 3},
        =90=                         a({href => "$ENV{SCRIPT_URI}?start=".($high + 1)},
        =91=                           "preview later pictures"))) :
        =92=                   ()),
        =93=                 );
        =94=    }
        =95=    
        =96=    print end_html;
        =97=    
        =98=    sub my_uri_escape {
        =99=      my $text = shift;
        =100=     $text =~ s/([^A-Za-z0-9_.\-\/])/sprintf "%%%02X", ord $1/ge;
        =101=     $text;
        =102=   }
        =103=   
        =104=   sub get_title {
        =105=     my $dir = shift;
        =106=     local *F;
        =107=     open F, "$dir/.title" and <F> =~ /(.+)/ and return $1;
        =108=     $dir eq ".." and return "Go up";
        =109=     $dir eq "." and return "The $ENV{SCRIPT_URL} directory";
        =110=     "The $dir directory";
        =111=   }
        =112=   
        =113=   BEGIN {
        =114=     my %info = ();
        =115=     sub get_info {
        =116=       my $path = shift;
        =117=       unless (%info) {
        =118=         local (*F, $/, $_);
        =119=         if (open F, ".info" and defined($_ = <F>)) {
        =120=           s/^\s*\#.*\n//mg;       # toss comments
        =121=           s/[ \t]*\n[ \t]+/ /g;   # fold continuation lines
        =122=           %info = /^(\S+)\s+(.*)/mg;
        =123=         }
        =124=         $info{"."} ||= " ";
        =125=       }
        =126=       $info{$path} || "Description not provided for $path";
        =127=     }
        =128=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.