Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 05 (Oct 1999)
[suggested title: Launching Processes]
Perl has many ways of launching and managing child processes. That's good, because a lot of Perl's role as the ``duct-tape of the Internet'' is to glue programs together. So, let's look at some of the most common ways, including the limitations and potential security considerations.
The easiest way to launch a child process is with system
:
system "date";
The child process here is the date
command. Anything that can
be invoked from a shell command prompt can be used in this string.
The child process inherits Perl's standard input, output, and error
output, so the output of this date
command will show up wherever
Perl's STDOUT
was going.
The command can be arbitrarily complex, including everything that
/bin/sh
(or its nearby equivalent) can handle:
system "for i in *; do echo == \$i ==; cat \$i; done";
Here, we're dumping out the contents of a directory, one file at a
time. The $i
vars are backslashed here, because Perl would have
expanded them to their current Perl values, and we want the shell to
see its own $i
instead. A quick solution here is to use
single quotes instead of double quotes:
system 'for i in *; do echo == $i ==; cat $i; done';
Or, you can just set the value of Perl's $i
to '$i'
, but that's
pretty twisted, and will probably drive the maintenance programmer
who inherits your code crazy.
This might look better spaced over multiple lines, so we can use a here-string to fix it:
system <<'END'; for i in * do echo == $i == cat $i done END
Yeah, that cleans it up a bit.
If the argument is simple enough, Perl avoids the shell, finding the
program directly. You may wish to adjust $ENV{PATH}
before calling
system
so that the program is found in the right place. Anything
complicated forces a shell though.
That shell can get in the way a bit though. Imagine invoking grep
on a few files based on a string in a scalar variable:
system "grep $look_for brief1 brief2 brief3";
Now if $look_for
is a nice easy string like "Monica"
, no big deal.
But if it's complicated like "White House"
, we now have a problem,
because that'll interpolate like this:
system "grep White House brief1 brief2 brief3";
which is looking for White
in the other four names, including a
file named House
. That's broken. Badly. So, perhaps we can fix
it by including some quotes:
system "grep '$look_for' brief1 brief2 brief3";
This works for White House
, but fails on Don't lie!
. And if
we change the shell single quotes to double quotes, that will just
mess up when $look_for
contains double quotes!
Luckily, we can avoid the shell entirely, using the multiple argument
version of system
:
system "grep", $look_for, "brief1", "brief2", "brief3";
When system
is given more than one argument, the first argument
must be a program found along the PATH
. The remaining arguments
are handed, uninterpreted by any shell, directly to the program. That
is, if it were another Perl script, the elements of @ARGV
in the
called program would be exactly one-for-one the same elements as this
list.
Because we now no longer call a shell, things like I/O redirection no longer work. So there are tradeoffs to this method, but it sure comes in handy. It's also a bit more secure--no chance that a nefarious user will come along and sneak a newline or semicolon in there. Some very popular CGI scripts didn't get this matter right, and ended up triggering a CERT notification as a security hole.
While the child process is executing, Perl is stopped. So if a command takes 35 seconds to run, Perl is stopped for 35 seconds. You can fork a child process in the background in the same way you'd do it in the shell:
system "long_running_command and the parameters &";
Beware, however that you'll have no easy way to interact with this command, nor even know its PID to kill it or see if it's still alive.
The return value of the system
operator is the value from the
wait
(or waitpid
) system call. That is, if the child process
exited with a zero value (everything went OK), so too will the return
value from system
be zero. A non-zero value is shifted left 8 bits
(or multiplied by 256, if you prefer). If a signal killed the
process, that's bitwise-or'ed into the number, and a 128 is added if
there's a core
file cluttering up the directory now.
If you don't grab the result from system
, the same number is
available in the special $?
variable. That is, until another
process is waited for, because $?
records only the most recently
waited-for process status. So, to get the specs on the most
recent exit, it's something like:
$status = ($? >> 8); $core_dumped = ($? & 128) > 0; $signal = ($? & 127);
Because the ``zero if everything is OK'' is backwards from most
of the rest of Perl, you shouldn't use or die
directly. Instead,
the easiest fix is to invert the output of system
with a logical
``not'' operation:
!system "some_maybe_failing_command" or die "we broke it";
The exec
operator works like system
, with respect to everything
above. However, instead of creating a child process to run the
selected command, the Perl process becomes the selected command.
Think of this as a goto instead of a subroutine call. For example:
exec "date";
Once this date
command begins executing, there's no Perl to come
back to. The only reason to put Perl code after an exec
is to
explain that date
was not found along the command path:
exec "date"; die "date not found in $ENV{PATH}";
In fact, if you turn on compile-time warnings and have anything but a
die
after exec
, you'll get notified.
One use of exec
is to use Perl to set up the operating enviroment
for a long-running command:
$ENV{DATABASE} = "MyDataBase"; $ENV{PATH} = "/usr/bin:/bin:/opt/DataBase"; chdir "/usr/lib/my.data" or die "Cannot chdir: $!"; exec "data_mangler"; die "data_mangler not found";
Replacing exec
with system
here would have still invoked
data_mangler
, but then we'd have a mostly useless Perl program
sitting around just waiting for data_mangler
to exit.
The processes started with system
and exec
can be interactive,
since they've inherited Perl's input and output. And, to aid in the
interaction with complicated programs like vi
, Perl ignores SIGINT
during the system
invocation, so that hitting control-C doesn't
abort Perl early.
Sometimes, you'll be invoking commands to capture their output value as a string in the program. The simplest way to do this is with backquotes, which function similar to the shell's use of backquotes:
$now = `date`;
Here, the standard output of date
is a 30-ish character string
followed by a newline. Everything sent to standard output is captured
as a string value, returned by the backquotes, and here saved into
$now
. If the value contains multiple lines, we may want to
split
it on newline to get each line. But it's probably easier to
use backquotes in a list context, which does this for us:
@logins = `who`;
Here, @logins
will have one element for each line of who
's
output. We can parse that in a loop like this:
for (`who`) { ($user, $tty, $when_where) = /^(\S+)\s+(\S+)\s+(.*)/; $logins{$user}{$tty} = $when_where; }
Each iteration through the loop gathers a different user's login, shoving it into a two-level hash keyed by user name and then terminal. When we're all done, we can dump it out ordered by user:
for $user (sort keys %logins) { for $tty (sort keys %{$logins{$user}}) { print "$user is on $tty from $logins{$user}{$tty}\n"; } }
Standard input for a backquoted command is inherited from Perl's standard
input, making it possible to have an external command suck down
all of STDIN
returning a modified version:
@sorted_input = `sort`;
Here, the sort
command (not the built-in Perl operator) is reading
all of standard input, sorting it, and then returning that to Perl as
a very large string value.
The backquoted command is double-quote interpolated, meaning that we
can use escapes like \n
and \t
, but also include Perl variables
to build parts of the command:
$checksum = `sum $file`;
However, everything that I warned you earlier about the
single-argument system
operator applies here as well--what if
$file
has embedded whitespace or other shell-significant
characters?
One solution is to use yet-another way of invoking a child process: the process-as-filehandle. Let's start with the easy for of that first, and return to this whitespace problem after getting the basics down.
If the second argument to open
ends in a vertical bar (pipe symbol),
Perl treats that as a command to launch rather than a filename:
open DATE, "date|";
At this point, a date
command is launched, with its standard output
connected to the DATE
filehandle open for reading. The rest of the
program doesn't know, doesn't care, and would have to work pretty hard
to figure out, that this is not a file but just another program. So,
we'll read from the output using the normal filehandle operations:
$now = <DATE>;
The process is running in parallel with Perl, with all the
coordination provided for standard pipe read/writes. So if the
date
command sent its output before Perl was ready, it would just
wait there, and if Perl read before date
was ready to write, the
Perl process would simply block until output was available, consuming
no CPU.
So how does this solve our whitespace problem of earlier? Well, there's a special kind of command opening like this:
my $pid = open CHILD, "-|";
which is a combination of a pipe-open and a fork. You may recall that
a fork
splits the current process into two processes: a parent
process and a child process. Initially, both processes are running
identical code, but we distinguish them by the return value from the
fork
call. The parent gets back the child's process ID number
(PID), and the child gets a zero value.
This fork-and-pipe opening operates similarly: the Perl process forks,
and the parent and child see differing results from the open
just
like a fork
. However, the child's STDOUT
is attached to the
parent's CHILD
filehandle automatically. This means that the child
can act like the date
command above, sending stuff to its standard
output, and we can read from that in the parent process.
So, to finish out the date example, we could do it all within Perl like so:
if ($pid) { # I'm the parent $now = <CHILD>; # read child } else { # I'm the child print scalar localtime, "\n"; exit 0; }
And now $now
is set to the output of the child process. We can also
exec
in the child, like so:
if ($pid) { $checksum = <CHILD>; } else { exec 'sum', $myfile; die "sum not found: $!"; }
And now we have used the two-argument form of exec
so there's never
any shell-character worries!
Perl also offers arbitrary invocations of fork
, waitpid
, pipe
and file-descriptor shuffling, permitting full access to the
underlying Unix system calls, but I've run out of space to talk about
them. Until next time, have fun launching processes!