Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 38 (Jul 2002)

[suggested title: Template-driven file management]

I've decided recently to put the stonehenge.com website under CVS management. With the CVS tools, I can ``check out'' a current version of the website sources, play with it a bit, test it on a development server, and then ``check in'' the changes for deployment on my live server, the same way the big boys do it. Also, I can let the other Stonehenge druids edit portions of the site as well, a task that had formerly been only my job (along with the dozens of other self-appointed roles I fill at Stonehenge).

Some of the Apache configuration files contain hard-coded pathnames. I pondered a lot of solutions, starting by spending a good day or two on rewriting all the config files so that they used only names relative to the config directory. I got stuck on one directive (for mod_proxy's cache configuration) that does not permit a relative name.

After punting on that idea, it occurred to me to run some or all of the source files through a substitution process that could plug in the pathnames and perhaps a few changable configuration values. Of course, I could be like the 45 other CPAN authors who wrote their own templating system, since that seems to be a rite of passage for a budding Perl hacker. However, I knew that much of my site's new design would also be processed dynamically using Andy Wardley's most excellent Template Toolkit. So, I decided on using Template at ``build time'' as well.

The Template distribution includes a ttree utility that at first glance seemed to do what I want: take a tree of files and process them into a target tree, updating only the files which had changed. But I needed similar structures to process files that weren't templated, and also files that were derived from many source files, and that was outside ttree's design. So, I stole the important pieces of source code of ttree to make my template processing engine, shown in [listing one, below].

I decided to drive my templating engine from a control file, typically read from STDIN, consisting of an output and input filename per line, separated by whitespace. (Yeah, the first time I have a filename that has embedded whitespace, I'll be in trouble, and I'll have to rewrite this bit.) To create the control file, I use a GNU-style Makefile, presented in [listing two, below]. But first, let's focus on the templating engine.

Lines 1 through 3 start nearly every Perl program I write, enabling compile and runtime warnings, restricting the use of barewords, soft references, and undeclared variables, and ensuring that STDOUT is unbuffered.

Lines 5 and 6 pull in the Template and Getopt::Long modules, found in the CPAN.

Lines 8 through 11 process the two command-line options: a flag that provides a pre-processing hook file for Template, and an option to force processing regardless of timestamps. Because my makefile will want to have authority about processing a particular file, I'll use the force option from my makefile. As yet, I haven't used the preprocessing flag.

Lines 13 through 19 set up the Template object, including the particular configuration options needed for my operation. Relative pathnames are needed to permit the Makefile to specify filenames below the current directory (relative to the include path). The preprocess template is given by the value associated with the option, or undef, meaning no preprocess template. And finally, I decided to use the star tag style in the ``build phase'' to distinguish it from the normal Template style to be executed at page delivery time. This permits template instructions like:

    [* IF env.ENABLE_JOKES -*]
      [% PROCESS stonehenge/sidebar/jokes %]
    [*- END *]

If the environment variable ENABLE_JOKES is set (while we're building the site), then the directive is included to process the sidebar at page delivery time. (The env hash is set as a Template variable: we'll see this in a moment.)

Lines 22 to 43 form the main processing loop. To prevent duplicate consideration of a particular templated file, line 21 defines a %seen hash, containing the lines we've processed so far as keys. Sometimes during my testing, I'd update a templated file, but the template processing would fail. The next make run would again add the template to the list of things out of date, and this template engine would end up seeing the item twice.

Line 25 extracts the output filename and the input filename. Lines 26 and 27 ensure that the input exists, and grab the stat information to use later (for the modification time and permissions and ownership).

Lines 29 to 33 allow the template engine to be a ``mini-make''. Unless the --force option is given on the command line, the output file has to be newer than the input file or else we'll process the file. You could use the template engine with a static list of source/destination pairs this way, and the engine would perform minimal work to update the files. However, since we're letting make determine out-of-date files, we'll be skipping this code in my use.

If we make it to line 35, it's time to run the template. The call to the process method of the Template object does the job. The middle parameter defines the predefined variables available to the individual templates. In this case, we're passing the environment variables as the name env. Individual enviroment variable names are available as env.PATH or env.SHELL, and so on. This is the primary means by which the Makefile can parameterize the templates, including overriding the values for a particular build.

If the processing fails, line 36 displays that, along with the Template error message. On success, the processing is noted in line 37.

Lines 39 to 42 copy the ownership and permissions from the source file to the destination file. Failures are noted as an advisory, although execution continues.

So that's the template processor. When executed, it looks for lines on standard input like:

    /web/stonehenge/etc/httpd.conf etc/httpd.conf.tmpl

to process the conf file from a relative-path-named local source file. And the httpd.conf.tmpl file contains mostly constant text, except for things that need to vary based on the installation directory or other local parameters, like:

    ServerName [* env.SERVERNAME *]
    Listen [* env.LISTEN_AT *]
    DocumentRoot [* env.PREFIX *]/htdocs
    PIDFile [* env.PREFIX *]/var/run/httpd.pid
    ScoreBoardFile [* env.PREFIX *]/var/run/httpd.scoreboard
    LockFile [* env.PREFIX *]/var/run/httpd.lock
    <Directory [* env.PREFIX *]/htdocs>
    ....
    </Directory>

And these are replaced enroute to the httpd.conf file. Like magic. Additionally, repetitive items or conditional items can be captured as Template blocks or macros. (I'm just starting to scratch the surface of this now. Perhaps I'll cover that in greater depth in a future column.)

Of course, this doesn't make sense without the Makefile depositing the right items into that control file, or making the other directories and copying the other files over. So, let's take a look at how that's done.

The trickiest part of the Makefile design was ensuring that the template engine would get run once at the end of the pass. In BSD Make, this can be achieved with a .END target, but GNU Make didn't have such a feature. With the help of fellow Perl hacker Uri Guttman, we came up with a weird hack that's rather cool once you get your head around it.

Nearly the entire Makefile is split into two pieces. On a normal invocation, only the first few lines (from line 8 to line 12) are executed, recursively invoking make on the same Makefile looking to build the same targets, but adding the FINAL target as well. And to note that we're the recursed version, an additional variable is added (RECURSED). And that then skips over lines 8 through 12 (thanks to the conditional on line 7), and we run the rest of the file.

There's almost certainly a more clever way of doing this, but I couldn't find it after half a day of searching the net and asking my friends, until Uri stumbled through something close to this solution.

Lines 17 to 21 define the configuration parameters used by the templates, and by the Makefile itself.

PREFIX is the execution top-level directory. INSTALLPREFIX is usually the same as PREFIX, except when you want to tar up the files for an RPM or other distribution bundler, or want to ``stage in'' the live data. For example, if your live site is running off /web/stonehenge, you can build and install a new website from scratch with a minimum of downtime with:

  $ make INSTALLPREFIX=/web/NEW
  $ /web/stonehenge/sbin/apachectl stop
  $ mv /web/stonehenge /web/stonehenge.OLD
  $ mv /web/NEW /web/stonehenge
  $ /web/stonehenge/sbin/apachectl start

By making INSTALLPREFIX separate from PREFIX, we can stage the files into that temporary directory for the fast switch.

APACHE_PREFIX defines the prefix that Apache was built with. The etc and sbin directory would be immediately below this, for example.

SERVERNAME and LISTEN_AT define the server information. Again, the point of this configuration setup is to be able to run a development version of the server at a different location, perhaps even on a different box, so these must be configurable.

Lines 25 and 26 define variables that should not be overriden from the command line. In particular, note the use of I, which permits $I to be written in rules easily.

Line 30 defines a macro to crawl through a given subdirectory, looking for any files (that aren't Emacs editor backups), and returns their equivalent names in the INSTALLPREFIX hierarchy. An optional .tmpl is also automatically removed. This macro is used in the various rules to avoid explictly naming all the files in the directories.

Lines 32 to 52 define the rules for building each of the subdirectories, including a group install target to build just a portion of the data. There's a lot of repetition and repetition here, but I couldn't find a way to reduce that. Note that the pattern is similar: the top-level install target depends on a particular install-foo target. A similarly-named variable is loaded up by calling the macro defined earlier, and then the install-foo target is made to depend on those filenames.

But where do the rules get selected to either copy those files or run them through the templating engine? Ahh, that's the magic down in lines 54 to 62. If a file wanted under the INSTALLPREFIX directory has a corresponding file relative to the local current directory, then we simply copy it over (if it's out of date), after first making its parent directory if needed.

However, if the file wanted in the INSTALLPREFIX directory has a corresponding .tmpl file, then we note that for the template engine to process, by writing the destination and source into the run-template.in file. Note that all template files are also dependent on the templater itself, and the GNUmakefile. That way, edits to either of these files cause the templates to be re-run.

So the typical install step copies a bunch of text files directly from the source directories to the destination, and notes the template files that also have to be processed. But where do the templates get processed? Recall that the recursive invocation also wants FINAL to be built, after building the designated targets. Ahh, lines 64 to 72 define the rules for that. First, FINAL depends on run-template.out, so we need to bring that up to date. It's up to date only when newer than run-template.in. But if it doesn't exist, or it's not newer, we'll run the commands in lines 69 to 71. The templater processes the control file (scribbled into by line 62), then the control file is emptied out (line 70), and the output file is then touched (in line 71) to make it newer than the input. If for some reason, the input file never got created, an empty one is created in line 72. I'm not sure if I still need this step: but it certainly didn't hurt to leave it in.

And that's it. In these two core structures, you've got the means to build a hierarchy of files, some of which are run through a templating engine, with a minimal amount of copying around as you edit stuff. And that's the guts of my new web-site building engine. Until next time, enjoy!

Listings

        =0=     #################### LISTING ONE ####################
        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use Template;
        =6=     use Getopt::Long;
        =7=     
        =8=     GetOptions(
        =9=                'preprocess' => \ (my $preprocess),
        =10=               'force!' => \ (my $force = 0),
        =11=    ) or die "see code for usage\n";
        =12=    
        =13=    my $t = Template->new
        =14=      ({
        =15=        RELATIVE => 1,
        =16=        PRE_PROCESS => $preprocess,
        =17=        INCLUDE_PATH => ['.'],
        =18=        TAG_STYLE => 'star',
        =19=       });
        =20=    
        =21=    my %seen;
        =22=    while (<>) {
        =23=      next if $seen{$_}++;
        =24=    
        =25=      my($outname, $inname) = split;
        =26=      my @instat = stat($inname) or
        =27=        print("  - $inname (can't stat)\n"), next;
        =28=    
        =29=      unless ($force) {
        =30=          my @outstat = stat($outname);
        =31=          @outstat and $outstat[9] > $instat[9] and
        =32=              print("  - $inname (not newer)\n"), next;
        =33=      }
        =34=    
        =35=      $t->process($inname, {env => \%ENV}, $outname) or
        =36=        print("  ! ", $t->error(), "\n"), next;
        =37=      print("  + $inname => $outname\n");
        =38=    
        =39=      chown $instat[4], $instat[5], $outname
        =40=        or warn "Cannot chown @instat[4,5] $outname: $!";
        =41=      chmod $instat[2], $outname
        =42=        or warn "Cannot chmod $instat[2] $outname: $!";
        =43=    }
        =0=     #################### LISTING TWO ####################
        =1=     ### mandatory
        =2=     SHELL = /bin/sh
        =3=     .SUFFIXES:
        =4=     
        =5=     ### ensure FINAL
        =6=     
        =7=     ifndef RECURSED
        =8=     MAKECMDGOALS ?= install
        =9=     
        =10=    $(MAKECMDGOALS):
        =11=            @$(MAKE) --no-print-directory RECURSED=1 $(MAKECMDGOALS) FINAL
        =12=    
        =13=    else # endif is at end of file
        =14=    
        =15=    ### external configuration variables (from env or make-line)
        =16=    
        =17=    export PREFIX ?= /web/stonehenge
        =18=    export INSTALLPREFIX ?= $(PREFIX)
        =19=    export APACHE_PREFIX ?= /opt/apache/1.3.23
        =20=    export SERVERNAME ?= www.stonehenge.com
        =21=    export LISTEN_AT ?= www.stonehenge.com:80
        =22=    
        =23=    ### internal variables (should require no change)
        =24=    
        =25=    I = $(INSTALLPREFIX)
        =26=    TEMPLATER = ./run-template
        =27=    
        =28=    ### macros
        =29=    
        =30=    get_installs_from_subdir = $(patsubst %,$I/%,$(patsubst %.tmpl,%,$(shell find $1 -type f ! -name '*~' -print)))
        =31=    
        =32=    ### subdirectories
        =33=    
        =34=    ## etc
        =35=    install: install-etc
        =36=    install_etc_files := $(call get_installs_from_subdir, etc)
        =37=    install-etc: $(install_etc_files)
        =38=    
        =39=    ## htdocs
        =40=    install: install-htdocs
        =41=    install_htdocs_files := $(call get_installs_from_subdir, htdocs)
        =42=    install-htdocs: $(install_htdocs_files)
        =43=    
        =44=    ## sbin
        =45=    install: install-sbin
        =46=    install_sbin_files := $(call get_installs_from_subdir, sbin)
        =47=    install-sbin: $(install_sbin_files)
        =48=    
        =49=    ## var
        =50=    install: install-var
        =51=    install_var_files := $(call get_installs_from_subdir, var)
        =52=    install-var: $(install_var_files)
        =53=    
        =54=    ### pattern rules
        =55=    
        =56=    $I/%: %
        =57=            mkdir -p $(dir $@)
        =58=            cp $< $@
        =59=    
        =60=    $I/%: %.tmpl $(TEMPLATER) GNUmakefile
        =61=            @echo want: $< '=>' $@
        =62=            @echo $@ $< >>$(TEMPLATER).in
        =63=    
        =64=    ### handle FINAL step
        =65=    
        =66=    FINAL: $(TEMPLATER).out
        =67=    
        =68=    $(TEMPLATER).out: $(TEMPLATER).in
        =69=            $(TEMPLATER) --force $<
        =70=            -@cp /dev/null $<
        =71=            -@touch $@
        =72=    $(TEMPLATER).in:; touch $@
        =73=    
        =74=    endif # matches ifdef/else at top of file

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.