Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 10 (September 1996)

In the last column, I recreated a portion of the common Unix utility, grep, while giving it additional functionality -- the ability to ignore non-text files. This time, I'm going to illustrate how to use a Perl ``wrapper'' to use the existing grep command for the actual search, but still retaining the ``text-only'' nature of the function.

A wrapper generally replaces the invocation of a particular program: I'd invoke ``textgrep'' instead of ``grep'' in my scripts, for example. If I wanted to be particularly clever, I could call the wrapper ``grep'', and then put the real ``grep'' into a place not normally found in my path, known only to the wrapper.

In this case, the wrapper is invoked with the same arguments as the standard grep command. The wrapper will examine only the filename arguments (ignoring the others), and remove any filename argument that corresponds to a binary file. Any remaining filename arguments, along with the previously ignored arguments, are then passed to the real grep command transparently.

The first part of the program is to recognize the difference between the filename arguments and the other arguments. Let's look at this part of the program:

        while ($ARGV[0] =~ /^-[a-z]$/) {
                push @OPTS, shift;
        }

Here, we're looking at the first entry of @ARGV (as designated by $ARGV[0]), and if it looks like an option (a minus followed by a single letter), we transfer it from @ARGV to @OPTS. Here, the ``shift'' operator is removing the first element of @ARGV by default.

But this strategy fails on the ``-e'' and ``-f'' options, which take a following parameter. This parameter would not likely look like an option, so, we need to add an additional step:

        while ($ARGV[0] =~ /^-[a-z]$/) {
                if ($ARGV[0] =~ /^-[ef]$/) {
                        push @OPTS, shift;
                }
                push @OPTS, shift;
        }

There. Now, if it's -e or -f, two args get moved over instead of one. As you can see, writing a wrapper requires a fairly complete understanding of the arguments to the wrapped program.

Next, we need to process whatever's left in @ARGV, ripping out the non-text filenames. This is pretty much the same code as in the last column:

        @ARGV = "-" unless @ARGV;
        @ARGV = grep { -T or $_ eq "-" } @ARGV;
        exit 0 unless @ARGV;

The first step inserts ``-'' if there's no remaining arguments. This won't change the meaning of @ARGV, but gives us something to work with later. The second step removes from @ARGV any filename that is neither a ``text'' file (as determined by the -T operator) or the ``-'' (meaning standard input). Standard input is always considered to be a text file. The third step causes us to exit with a zero exit status if there are no text files to process.

Finally, we need to invoke grep with this thus-modified command-line:

        exec "grep", @OPTS, @ARGV;

The ``exec'' operator invokes the ``grep'' command (found according to the current PATH environment variable), passing it the options (in @OPTS) if any, and then the list of filenames to process (in @ARGV).

This gets us most of the way there. I could stick this program into ``textgrep'', and it'd work. However, let's look at the wrapper like I mentioned earlier: something to replace the ``grep'' command that I would still invoke as ``grep''. First, I'll need to move the grep command out of the way (say, to ``/usr/bin/realgrep''), and tell this program where it is:

        $real_grep = "/usr/bin/realgrep";

Next, I'll want to invoke this realgrep, but convince it that it is being invoked as ``grep'' (in case it cares). This is done by lying to it about its name and location. I'll want the real grep to believe it is named the same as my textgrep script (which will now be called grep). Fortunately, this path is available in the $0 variable, and can be passed as the ``argv[0]'' to the new program using a funky feature of exec:

        exec $0 $real_grep, @OPTS, @ARGV;

Notice the $0 in the slot between the exec operator and the program name to execute and the list of arguments to pass.

There's now not much the real grep command can do to find out that it wasn't really located in the old location. Putting it all together:

        #!/usr/bin/perl
        $real_grep = "/usr/bin/realgrep";
        while ($ARGV[0] =~ /^-[a-z]$/) {
                if ($ARGV[0] =~ /^-[ef]$/) {
                        push @OPTS, shift;
                }
                push @OPTS, shift;
        }
        @ARGV = "-" unless @ARGV;
        @ARGV = grep { -T or $_ eq "-" } @ARGV;
        exit 0 unless @ARGV;
        exec $0 $real_grep, @OPTS, @ARGV;
        die "cannot exec $real_grep: $!";

I've added a ``die'' here to capture the problems of not being able to find the $real_grep program.

Now, why ``exec'' instead of ``system''? Using exec causes the Perl script to ``exec'' the grep program, rather than launching grep as a child process. If this step is successful (as it mostly will be if we got the pathnames correct), then the Perl interpreter is gone, having become the grep command. The only code after an exec should be to handle a failed exec (such as the ``die'' here).

There are other uses for wrapper scripts. For example, suppose you had a database accounting package that need to have a certain umask and current directory established, as well as a few important environment variables. While you could create a shell script to do all of this, let's see how it's done in Perl:

        #!/usr/bin/perl
        my $DB_HOME = "/home/merlyn/database";
        chdir $DB_HOME
                or die "cannot get to db dir: $!";
        umask 2;
        $ENV{'DB_HOME'} = $DB_HOME;
        $ENV{'PATH'} .= ":$DB_HOME";
        exec "accounting", @ARGV;
        die "Cannot exec accounting: $!";

Here, I've established a Perl variable $DB_HOME, representing the directory of this accounting package. Then, I ``cd'' there in the Perl process, and set my umask to ``002'' (making newly created files or directories with only the ``other-write'' permission off). Next, I set the environment variable DB_HOME to the value of the Perl variable of the same name. Note that Perl variables are not automatically exported, so this is the equivalent of:

        setenv DB_HOME $DB_HOME

in the C-shell, or:

        export DB_HOME

in the Bourne shell. Also, I've updated the PATH environment variable to attach the DB_HOME directory as one of the directories containing programs. Note that I am appending the new directory to the end of the list, and not the beginning. If I wanted to do the beginning, I'd have to write it as:

        $ENV{'PATH'} = "$DB_HOME:$ENV{'PATH'}";

And finally, the invocation of the ``accounting'' program, followed by a diagnostic if the exec fails. Note that I'm passing the @ARGV list through to the accounting package, essentially uninterpreted. If there were some common arguments to pass, say ``-d $DB_HOME'', it'd look like this:

        exec "accounting", "-d",
                $DB_HOME, @ARGV;

(This is probably a pretty stupid accounting package that requires so many things to be set to DB_HOME, but hopefully you can see where I'm going here.)

Another kind of a wrapper is one that messes with the standard file descriptors to alter the behavior slightly. For example, the standard ``find'' command has the mostly useful feature of reporting un-scannable directories via a message to standard error. However, when we aren't interested in what find cannot do, these messages are just an annoyance.

Certainly, with the proper command line I/O redirection, I can cause stderr to be tossed away. However, let's create a wrapper instead that completely eliminates the messages for STDERR:

        #!/usr/bin/perl
        open STDERR, ">/dev/null";
        exec "find", @ARGV;
        die "Cannot exec find: $!";

There. Pretty simple. I could stick this into a file named ``qfind'' (for ``quiet find'', and then invoke it with:

        qfind / -name '*perl*' -ls

and now I won't get all those messages telling me about directories I cannot enter.

I could also do the same on the accounting database wrapper to log all STDERR messages into a log file (just before I invoke ``exec''):

        open STDERR, ">>log";

Well, this about wraps up my discussion of wrappers. Hope you enjoyed it.

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 10 (September 1996)