Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 05 (Oct 1999)

[suggested title: Launching Processes]

Perl has many ways of launching and managing child processes. That's good, because a lot of Perl's role as the ``duct-tape of the Internet'' is to glue programs together. So, let's look at some of the most common ways, including the limitations and potential security considerations.

The easiest way to launch a child process is with system:

        system "date";

The child process here is the date command. Anything that can be invoked from a shell command prompt can be used in this string.

The child process inherits Perl's standard input, output, and error output, so the output of this date command will show up wherever Perl's STDOUT was going.

The command can be arbitrarily complex, including everything that /bin/sh (or its nearby equivalent) can handle:

        system "for i in *; do echo == \$i ==; cat \$i; done";

Here, we're dumping out the contents of a directory, one file at a time. The $i vars are backslashed here, because Perl would have expanded them to their current Perl values, and we want the shell to see its own $i instead. A quick solution here is to use single quotes instead of double quotes:

        system 'for i in *; do echo == $i ==; cat $i; done';

Or, you can just set the value of Perl's $i to '$i', but that's pretty twisted, and will probably drive the maintenance programmer who inherits your code crazy.

This might look better spaced over multiple lines, so we can use a here-string to fix it:

        system <<'END';
        for i in *
        do
                echo == $i ==
                cat $i
        done
        END

Yeah, that cleans it up a bit.

If the argument is simple enough, Perl avoids the shell, finding the program directly. You may wish to adjust $ENV{PATH} before calling system so that the program is found in the right place. Anything complicated forces a shell though.

That shell can get in the way a bit though. Imagine invoking grep on a few files based on a string in a scalar variable:

        system "grep $look_for brief1 brief2 brief3";

Now if $look_for is a nice easy string like "Monica", no big deal. But if it's complicated like "White House", we now have a problem, because that'll interpolate like this:

        system "grep White House brief1 brief2 brief3";

which is looking for White in the other four names, including a file named House. That's broken. Badly. So, perhaps we can fix it by including some quotes:

        system "grep '$look_for' brief1 brief2 brief3";

This works for White House, but fails on Don't lie!. And if we change the shell single quotes to double quotes, that will just mess up when $look_for contains double quotes!

Luckily, we can avoid the shell entirely, using the multiple argument version of system:

        system "grep", $look_for, "brief1", "brief2", "brief3";

When system is given more than one argument, the first argument must be a program found along the PATH. The remaining arguments are handed, uninterpreted by any shell, directly to the program. That is, if it were another Perl script, the elements of @ARGV in the called program would be exactly one-for-one the same elements as this list.

Because we now no longer call a shell, things like I/O redirection no longer work. So there are tradeoffs to this method, but it sure comes in handy. It's also a bit more secure--no chance that a nefarious user will come along and sneak a newline or semicolon in there. Some very popular CGI scripts didn't get this matter right, and ended up triggering a CERT notification as a security hole.

While the child process is executing, Perl is stopped. So if a command takes 35 seconds to run, Perl is stopped for 35 seconds. You can fork a child process in the background in the same way you'd do it in the shell:

        system "long_running_command and the parameters &";

Beware, however that you'll have no easy way to interact with this command, nor even know its PID to kill it or see if it's still alive.

The return value of the system operator is the value from the wait (or waitpid) system call. That is, if the child process exited with a zero value (everything went OK), so too will the return value from system be zero. A non-zero value is shifted left 8 bits (or multiplied by 256, if you prefer). If a signal killed the process, that's bitwise-or'ed into the number, and a 128 is added if there's a core file cluttering up the directory now.

If you don't grab the result from system, the same number is available in the special $? variable. That is, until another process is waited for, because $? records only the most recently waited-for process status. So, to get the specs on the most recent exit, it's something like:

        $status = ($? >> 8);
        $core_dumped = ($? & 128) > 0;
        $signal = ($? & 127);

Because the ``zero if everything is OK'' is backwards from most of the rest of Perl, you shouldn't use or die directly. Instead, the easiest fix is to invert the output of system with a logical ``not'' operation:

        !system "some_maybe_failing_command"
                or die "we broke it";

The exec operator works like system, with respect to everything above. However, instead of creating a child process to run the selected command, the Perl process becomes the selected command. Think of this as a goto instead of a subroutine call. For example:

        exec "date";

Once this date command begins executing, there's no Perl to come back to. The only reason to put Perl code after an exec is to explain that date was not found along the command path:

        exec "date";
        die "date not found in $ENV{PATH}";

In fact, if you turn on compile-time warnings and have anything but a die after exec, you'll get notified.

One use of exec is to use Perl to set up the operating enviroment for a long-running command:

        $ENV{DATABASE} = "MyDataBase";
        $ENV{PATH} = "/usr/bin:/bin:/opt/DataBase";
        chdir "/usr/lib/my.data" or die "Cannot chdir: $!";
        exec "data_mangler";
        die "data_mangler not found";

Replacing exec with system here would have still invoked data_mangler, but then we'd have a mostly useless Perl program sitting around just waiting for data_mangler to exit.

The processes started with system and exec can be interactive, since they've inherited Perl's input and output. And, to aid in the interaction with complicated programs like vi, Perl ignores SIGINT during the system invocation, so that hitting control-C doesn't abort Perl early.

Sometimes, you'll be invoking commands to capture their output value as a string in the program. The simplest way to do this is with backquotes, which function similar to the shell's use of backquotes:

        $now = `date`;

Here, the standard output of date is a 30-ish character string followed by a newline. Everything sent to standard output is captured as a string value, returned by the backquotes, and here saved into $now. If the value contains multiple lines, we may want to split it on newline to get each line. But it's probably easier to use backquotes in a list context, which does this for us:

        @logins = `who`;

Here, @logins will have one element for each line of who's output. We can parse that in a loop like this:

        for (`who`) {
                ($user, $tty, $when_where) =
                        /^(\S+)\s+(\S+)\s+(.*)/;
                $logins{$user}{$tty} = $when_where;
        }

Each iteration through the loop gathers a different user's login, shoving it into a two-level hash keyed by user name and then terminal. When we're all done, we can dump it out ordered by user:

        for $user (sort keys %logins) {
                for $tty (sort keys %{$logins{$user}}) {
                        print "$user is on $tty from $logins{$user}{$tty}\n";
                }
        }

Standard input for a backquoted command is inherited from Perl's standard input, making it possible to have an external command suck down all of STDIN returning a modified version:

        @sorted_input = `sort`;

Here, the sort command (not the built-in Perl operator) is reading all of standard input, sorting it, and then returning that to Perl as a very large string value.

The backquoted command is double-quote interpolated, meaning that we can use escapes like \n and \t, but also include Perl variables to build parts of the command:

        $checksum = `sum $file`;

However, everything that I warned you earlier about the single-argument system operator applies here as well--what if $file has embedded whitespace or other shell-significant characters?

One solution is to use yet-another way of invoking a child process: the process-as-filehandle. Let's start with the easy for of that first, and return to this whitespace problem after getting the basics down.

If the second argument to open ends in a vertical bar (pipe symbol), Perl treats that as a command to launch rather than a filename:

        open DATE, "date|";

At this point, a date command is launched, with its standard output connected to the DATE filehandle open for reading. The rest of the program doesn't know, doesn't care, and would have to work pretty hard to figure out, that this is not a file but just another program. So, we'll read from the output using the normal filehandle operations:

        $now = <DATE>;

The process is running in parallel with Perl, with all the coordination provided for standard pipe read/writes. So if the date command sent its output before Perl was ready, it would just wait there, and if Perl read before date was ready to write, the Perl process would simply block until output was available, consuming no CPU.

So how does this solve our whitespace problem of earlier? Well, there's a special kind of command opening like this:

        my $pid = open CHILD, "-|";

which is a combination of a pipe-open and a fork. You may recall that a fork splits the current process into two processes: a parent process and a child process. Initially, both processes are running identical code, but we distinguish them by the return value from the fork call. The parent gets back the child's process ID number (PID), and the child gets a zero value.

This fork-and-pipe opening operates similarly: the Perl process forks, and the parent and child see differing results from the open just like a fork. However, the child's STDOUT is attached to the parent's CHILD filehandle automatically. This means that the child can act like the date command above, sending stuff to its standard output, and we can read from that in the parent process.

So, to finish out the date example, we could do it all within Perl like so:

        if ($pid) { # I'm the parent
                $now = <CHILD>; # read child
        } else { # I'm the child
                print scalar localtime, "\n";
                exit 0;
        }

And now $now is set to the output of the child process. We can also exec in the child, like so:

        if ($pid) {
                $checksum = <CHILD>;
        } else {
                exec 'sum', $myfile;
                die "sum not found: $!";
        }

And now we have used the two-argument form of exec so there's never any shell-character worries!

Perl also offers arbitrary invocations of fork, waitpid, pipe and file-descriptor shuffling, permitting full access to the underlying Unix system calls, but I've run out of space to talk about them. Until next time, have fun launching processes!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 05 (Oct 1999)