Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 45 (Jan 2003)
[suggested title: ``The Duct Tape of the Internet'']
When you're a Perl programmer, you never fret about those little ugly tasks that creep up. Perl can deal with file wrangling, text manipulation, and process management in a way unequaled by any other single language, whether open-source or proprietary.
For example, let's take a simple file and text wrangling task, and see how I solved it with Perl. Having been a system administrator for many years, I'd say that this task is representative of those niggling little things that I would face, typically daily, in the course of my job.
Nearly all Perl modules contain embedded documentation, called ``POD''
(described by perldoc perlpod
). When I install a module from the
Comprehensive Perl Archive Network (the ``CPAN'': see www.cpan.org
for further information), the module is usually installed into a place
that my Perl binary can find it (along Perl's @INC
path). By
default, the installation process also creates an nroff -man
page,
so that the man command can display a nicely formatted version
(presuming you extend your MANPATH
or equivalent). Thus, for most
modules, you can say either perldoc Some::Module
(to convert the
embedded POD into text), or man Some::Module
(to display the
preprocessed man page).
However, the server that runs www.stonehenge.com
runs OpenBSD
(mostly so I can sleep at night knowing that security is a key point
of the OpenBSD developers). The default Perl installation of OpenBSD
is configured in such a way that the man pages are not generated
for non-core Perl modules. Instead, I'm expected to type perldoc
Some::Module
to get the documentation for the module, instead of my
more familiar man Some::Module
, except that I can use man for
the core modules. As I find this rather confusing, I faced two alternatives:
-
I could hack the core installation of Perl so that it would install man pages, thereby risking breakage if the Perl installation was upgraded during a minor or major release.
-
I could write a simple tool to take all the embedded POD and generate man pages into my private area.
I decided to write a simple tool, mostly because I'm opposed to touching anything in the core distribution, since I have no idea if someone at OpenBSD headquarters is likely to change things out from under me.
And a simple tool it is, although it's about 80 lines of Perl code. So, looking at a few lines at a time, let's see what I wrote, in about the order that I created the lines.
First, I started with my normal header:
#!/usr/bin/perl -w use strict; $|++;
With these three lines, I've turned on warnings, enabled the common
compiler restrictions (undeclared variables, soft references, and
barewords are all disabled), and turned off the buffering for
STDOUT
.
Next, I put a few configuration lines that I might change, based on where I was running the program:
## BEGIN configuration
my $MAN3DIR = "/home/merlyn/man/man3"; my $MAN3EXT = "3p";
## END configuration
Here I've defined a location below my home directory where I've placed
other personal manpages, and an extension for the specific Perl module
pages. Traditionally, Perl modules have the 3p
extension and are
placed in section 3 of the Unix manual. I've added
/home/merlyn/man
to my MANPATH
, so the man command finds this
directory just fine.
use Pod::Man; use File::Find; use Config;
Following that, I bring in the 3 modules (all in the Perl core
distribution) that I'll need to wander through the installed
directories and find the POD files. The Pod::Man
module can
convert POD into manpages. The File::Find
module recurses through
subdirectories. The Config
module provides a hash interface to the
configuration parameters for the installed Perl. In fact, the next
two lines use that hash to locate two specific directories:
my $SITELIB = $Config{sitelib}; my $SITEARCH = $Config{sitearch};
The value for $SITELIB
gives the path in which local Perl modules
are installed. $SITEARCH
provides a similar path for
architecture-specific modules: those which contain binary files
resulting from compiling C (or other languages). Generally, the
$SITEARCH
directory will be within the $SITELIB
directory, and
this program presumes that.
Next, we'll create a Pod::Man
object configured for the task:
my $podmanparser = Pod::Man->new(section => $MAN3EXT);
The section
value gives the name appearing in the page header
banner, mostly cosmetic, but nice to get right.
Now comes the task of finding the existing POD documentation. So,
after a few tries, I came up with the following loop with File::Find
:
my %pods; find sub { return unless /\.p(m|od)$/; my $package = $File::Find::name; for ($package) { s{^\Q$SITEARCH/}{} or s{^\Q$SITELIB/}{} or die "Cannot remove $SITEARCH or $SITELIB from $File::Find::name\n"; s/\.p(m|od)$// or die "What happened to the ext in $package?\n"; s{/}{::}g; } push @{$pods{$package}}, $File::Find::name; }, $SITELIB;
There's a lot going on here, and it's best to work from the outside
in. The find
subroutine has been imported from File::Find
, and
is presented with a subroutine reference (here, an anonymous
subroutine) and a starting path, $SITELIB
. The find
routine
starts at the top directory, recursing down, calling the subroutine
for each found entry (even ones in which we're not interested). The
line
return unless /\.p(m|od)$/;
rejects the filenames that aren't either Perl modules or Perl POD
files by looking at $_
, which contains the basename (no directory
part) of the file or directory being examined. The next few lines
extract the package name for the filename into $package
. First, we
take the full path from $File::Find::name
, then remove either the
$SITEARCH
or $SITELIB
prefix from the path. If neither of these
succeed, then something has gone terribly wrong, so we'll abort.
Next, the lines:
s/\.p(m|od)$// or die "What happened to the ext in $package?\n"; s{/}{::}g;
turn the remainder of the name into a module name, by replacing the
slashes with double-colon package delimiters, and stripping off the
extension. Finally, the loop adds this file name to an arrayref
contained within the %pods
hash, indexed by the package name. Why
a list? Because many modules have a separate POD file, so we'll see
both <Some/Module.pm> and Some/Module.pod
. We'll sort out later
which of these to use for the manpage, but we'll record them all for
now.
When this loop has completed, we have a hash %pods
, keyed by
package name, with each entry being a list of one or more files that
may contain the documentation for that module.
When I showed this program to one of my friends, they then commented
(only after I toiled over this part), ``Why didn't you just use
Pod::Find
?''. Ah, yes. If I'd only known, I could have reduced
this part of the program to a few lines of code. I'll have to file
that away for use in a future program. The lesson here is ``always
check the CPAN first, because any interesting task is likely already
written''.
The next step is to wander through the hash, and do whatever it takes to update the manpages if needed. We'll start with a loop like this:
POD: for my $pod (sort keys %pods) { my @files = @{$pods{$pod}}; ... more code here ... }
I had to name the loop because we'll see a point later where I want to
execute a next
against this loop even though I'm in a nested loop.
So, $pod
contains a package name, and @files
contains one or
more source files for that package. Next, we need to figure out which
one of many source files is needed if there's more than one:
if (@files > 1) { # more than one? must sort @files = sort { ## primary: prefer arch-specific over non-arch-specific to_boolean($b =~ m{^\Q$SITEARCH}) <=> to_boolean($a =~ m{^\Q$SITEARCH}) ## secondary: prefer .pod to .pm or to_boolean($b =~ /\.pod$/) <=> to_boolean($a =~ /\.pod$/); } @files; } my $file = shift @files; # first one is always best now
Again, a lot of stuff going on here. If there's more than one file,
we'll sort it, preferring architecture-specific files over generic
files, and .pod
files over .pm
files. The first entry in the
list after sorting (or the only entry in the list if there was only
one to start with) is now the most likely candidate for our manpage.
The to_boolean
routine forces false to have 0 and true to have 1
so that we can sort nicely:
sub to_boolean { $_[0] ? 1 : 0; }
Next, we'll figure out the name of the manfile, and whether or not we have any work to do:
my $manfile = "$MAN3DIR/$pod.$MAN3EXT"; next if -e $manfile and -M $manfile < -M $file; # skip if exists and newer
If the manpage file exists, and is newer than our source file, we've got nothing to do, so we go on to the next entry.
At this point, we have a source file (either POD or Perl file) which
has not yet been updated into a manpage. However, the file may still
contain no POD directives. We need to look for some POD in the file.
The easiest way is to look for =head
at the beginning of a line.
This isn't entirely accurate, but it's the same rule that the
perldoc
command uses, so I figure it's close enough. And that code
came out like this (after a few tries):
open IN, $file or warn("Cannot open $file, skipping\n"), next POD; while (<IN>) { if (/^=head/) { # POD sign! print "pod2man $file $manfile\n"; not -e $manfile or unlink $manfile or warn("Cannot remove $manfile: $!\n"); open OUT, ">$manfile" or warn("Cannot create $manfile: $!\n"), next POD; seek IN, 0, 0; $podmanparser->parse_from_filehandle(\*IN, \*OUT); close OUT; next POD; } }
The meat is in the middle: once we've determined we have a decent POD
file, we seek the file back to the beginning, and then call
parse_from_filehandle
to generate the manpage.
So, any time I suspect that there's been a new module added to my local install, I can run this program, and my local manpage collection is updated, with minimal effort.
A simple task, simply executed by Perl, but handling an important issue of letting me get at Perl's documentation with either perldoc or man, working around a vendor limitation. Most of those ``gotta get it done now with no time to do it'' system administration tasks seem to be about this large, and as you can see, Perl fits the task nicely. So, until next time, enjoy!