I have mixed feelings about writing this column. I've been a strong
advocate of browser independent, standard HTML as the universal medium
of the Web. However, as a friend of mine (Devin Ben-Hur, a web
designer) points out, not all browsers are currently up to even the
current standards, and this can be stifling from a communications
point-of-view, or worse yet, confusing from a reader's view.
Devin suggested that I write a column about how to handle HTML that
adapts itself to the browser dynamically. I said that it sounded like
a neat idea, so here it is. So, thanks to Devin (email:
) for this month's idea.
I decided to take a tackle at the table problem. Most modern browsers
handle tables just fine, but two frequently-used browsers (Lynx, and
the w3-mode of GNU Emacs) do not. So, if data is to be presented in a
universal fashion, it would have to be HTML-table-encoded for nearly
all of the world, but generated as a pseudo-table (using
text.
The resulting code is contained within Listing 1 [below].
Lines 1 and 2 begin nearly every program I write for CGI scripts (and
anything else that's longer than 10 lines), enabling "taint checks",
warnings, and compile-time restrictions.
Lines 7 and 8 disable output buffering and set a particular path,
respectively. You'll probably want to adapt line 8 to whatever your
particular system requires.
Line 11 tells the server-side include processing that we're gonna be
generating real HTML, although this is mostly ignored, because what
matters to the browser is what the original including document defines
as a MIME type.
Lines 14 through 76 define an eval block, used for trapping errors.
If we die for any reason within this block, something reasonable is
generated, instead of just throwing the die message into the
server-log and then returning an error 500. More on that later.
Lines 17, 18, and 19 grab the URI, extra stuff, and user agent fields,
respectively, from the process environment variables. We need the URI
to find the correlated data table. We need the extra stuff to decide
the filename of the data. And finally, the user agent will determine
if we output an HTML table or a flat text pseudo-table.
Line 22 aborts if we aren't called as an SSI, because DOCUMENT_URI
(and hence $uri) will be empty.
Line 25 gets the directory and filename out of the URI (the document
that included us). This is needed because the data table is required
to be in the same directory as the original document.
Lines 28 through 32 translate this directory part into a UNIX path.
Now, this is necessarily system dependent, so I'm illustrating the
code for my ISP, Teleport. You'll most certainly have to figure this
out for yourself. At Teleport, the URL /~merlyn/fred.html is located
at UNIX path /home/merlyn/public_html/fred.html, so that's what we've
got.
Line 35 attempts to go to this computed directory, failing the entire
process if this also fails.
Lines 38 and 39 compute a filename within the directory that contains
the data to be formatted into either a table or a flat text layout.
The regular expression grabs the first word. This word has ".table"
attached to it, for security reasons. (Without it, it might be
possible to grab arbitrary files through guest books or other things,
and that seems a little too powerful. At least this way, only things
that end in ".table" are vulnerable.)
Lines 41 verifies this computed filename, and opens it if it exists.
Line 42 defines a temporary array called @max_in_col, which will be
used to hold the maximum column width seen so far. We'll need this
value if we are constructing a pseudo-table, but not if we're letting
the browser do all the layout stuff.
Line 43 defines @table, which holds the table data itself. This will
be a list of references to lists.
Line 44 decides whether we are going to output just flat text (a
pseudo-table) or a full HTML table, by looking at the user agent (in
$agent). Now, in my limited exposure of testing, I discovered that
neither Emacs-W3 nor Lynx handled tables (yet). So, if the user agent
string matches either of these, I'm gonna use a flat text operation.
If you discover other browsers that are table-challenged, it's only
necessary to extend this regular expression. The variable $text is
checked in a few places later in the program.
Lines 45 through 59 acquire the table data from the file. Each line
is read into the $_ variable, and then split by tabs into @row in line
47.
If the output format is flat text, then lines 49 through 57 examine
each column in the row to see how wide it is. We'll need that info
to determine the width of the maximum element in each column.
Lines 51 through 56 are executed once for each column. The data is
stored into $tmp in line 52, so that we can strip the HTML markup in
line 53. This way, the width of the string is gonna be just the text
without the HTML. This won't be completely correct, but it's a better
first cut than counting all the HTML tags.
Line 54 converts the string in $tmp into its length, and line 55 saves
that length into the @max_in_col array if the new length is wider than
what's been seen so far for that particular column.
Whether or not I'm building a text-table, the data itself gets shoved
in as an anonymous list into the @table array. Note that it would
be wrong to use:
push @table, \@row;
here, because that would put the same listref in every slot of the
table. Instead, I'm creating a brand new anonymous list by copying
the data once.
Lines 60 through 75 dump the collected data as either an HTML table
or a text pseudo-table, selected by the value of $text.
Lines 61 through 66 handle the text pseudo-table case. Line 61
creates a printf format string from all of the columns widths. This
is pretty complex, so let me break it down from right to left.
First, there's a map operator, which takes each element in @max_in_col
and turns it into "%-Ns", where "N" is the width. Then, those
elements are list-concatenated with an empty element on the left, and
a newline on the right. Then, the resulting list is glued together in
a single single string by putting " | " between the elements (but not
before or after).
Wow. The result will look something like:
" | %-5s | %-10s | %-3s | \n"
if @max_in_col was something like (5,10,3). As you can see, we
therefore create the correct format string to hand printf to put the
columns into the right shapes. Neat, huh?
Lines 62 and 66 put the right
and
enclosure around the
pseudo-table. Lines 63 through 65 output each line from the @table
array. Notice that the value in $_ is a listref representing an
original row. Line 64 dereferences this listref to get the original
data. The format string generated above automatically puts the data
into the right shape.
Lines 68 to 74 generate the HTML table structure from the same
original data. Lines 68 and 74 output the outer HTML codes.
Lines 69 through 73 generate each row, similar to the text-only
version above. Lines 70 and 72 bound the beginning and ending of the
table row, and line 71 enclose each table element in table data tags.
That's pretty much all there is in the main program. The final dozen
lines take care of unexpected errors or other Perl fatalities
resulting from the inside of the eval block. Line 79 detects an
error result, which is then stripped of any final newline by line 80.
Lines 81 through 85 massage the error message into something
meaningful and HTML-safe. Line 86 generates the message to the
output.
That's it. Now, to use this puppy, stick it into a CGI directory.
Let's say it's "/cgi/table". Then, plop your data down into a file
named "something.table" in the same directory as your HTML file in
which you want to use the data. Let's say it's "fred.table". Every
line of the file will be a row in the resulting table. Every column
in the table should be tab-separated from its neighbors.
Then, it's just a matter of shoving something like:
<!--#include cgi="/cgi/table/fred" -->
into your SSI-parsed file. Note that I'm using the Apache server
here, so your SSI invocation sequence may vary. Also, you'll have to
do the right thing to make sure the file itself is SSI parsed. That
might be by adding something to the .htaccess file, or renaming it so
that it ends in ".shtml" or turning on the executable bit or
something.
In summary, while I wouldn't recommend making every output decision
based on the browser type, from time to time knowing and using such
information can come in handy. See ya next time.
Listing 1
=1= #!/usr/bin/perl -Tw
=2= use strict;
=3=
=4= ## table: write a table or a <pre> based on browser type
=5=
=6= ## system stuff
=7= $|++;
=8= $ENV{PATH} = "/usr/ucb:/bin:/usr/bin";
=9=
=10= ## HTML stuff
=11= print "Content-type: text/html\n\n";
=12=
=13= ## the main program (in eval so we can trap problems)
=14= eval {
=15=
=16= ## get the CGI data
=17= my $uri = $ENV{DOCUMENT_URI}; # valid only in SSI
=18= my $path = $ENV{PATH_INFO}; # stuff after cgi name
=19= my $agent = $ENV{HTTP_USER_AGENT}; # user agent
=20=
=21= ## validate this as an SSI
=22= die "missing document_uri" unless $uri;
=23=
=24= ## split the URI up, so we know where the file was
=25= my ($dir,$file) = $uri =~ m,(.*/)(.*),s;
=26=
=27= ## massage the directory to get the containing dir
=28= if ($dir =~ m,^/~(\w+)/(.*)$,s) {
=29= $dir = "/home/$1/public_html/$2"; # teleport specific
=30= } else {
=31= die "cannot translate dir";
=32= }
=33=
=34= ## go there
=35= chdir $dir or die "cannot cd to $dir: $!\n";
=36=
=37= ## open sourcefile
=38= my ($filename) = $path =~ m!^/?([-\w.]+)!;
=39= $filename .= ".table"; # ensure only "whatever.table"
=40= die "missing filename in $path" unless $filename and -f $filename;
=41= open F, $filename or die "cannot open $filename: $!";
=42= my @max_in_col;
=43= my @table;
=44= my $text = $agent =~ /Emacs-W3|Lynx/i; # you may wish to extend this list
=45= while (<F>) {
=46= chomp;
=47= my @row = split /\t/;
=48=
=49= if ($text) {
=50= ## save maximums
=51= for (0..$#row) {
=52= my $tmp = $row[$_];
=53= $tmp =~ s/<.*?>//g; # don't count HTML markup
=54= $tmp = length $tmp;
=55= $max_in_col[$_] = $tmp if $max_in_col[$_] < $tmp;
=56= }
=57= }
=58= push(@table, [@row]);
=59= }
=60= if ($text) {
=61= my $format = join " | ", "", (map "%".(0-$_)."s", @max_in_col), "\n";
=62= print "<pre>\n";
=63= for (@table) {
=64= printf $format, @$_;
=65= }
=66= print "</pre>\n";
=67= } else {
=68= print "<table border=1>\n";
=69= for (@table) {
=70= print "<tr>";
=71= print map "<td>$_</td>", @$_;
=72= print "</tr>\n";
=73= }
=74= print "</table>\n";
=75= }
=76= };
=77=
=78= ## if an error, say so:
=79= if ($@) {
=80= chomp $@;
=81= $_ = "[error: $@]";
=82= s/&/&/g;
=83= s/</</g;
=84= s/>/>/g;
=85= s/\n/<br>/g;
=86= print;
=87= }