Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 41 (May 2002)
[suggested title: ``Doing many things, like pings'']
As a Unix system administrator, I'm often faced with those little mundane tasks that seem so trivial to me but so important to the community I'm supporting. Little things like ``hey, is that host up and responding to pings?''. Such tasks generally have a very repetitive nature to them, and scripting them seems to be the only way to have time to concentrate on the tasks that really need my attention.
Let's look at the specific task of pinging a number of hosts on a
subnet. Now, there are tools to do this quickly (like nmap
), and
there are even Perl modules to perform the ping
(as in
Net::Ping
), but I wanted to focus on something familar that can be
launched from Perl as an external process, and the system ping
command seems mighty appropriate for that.
First, let's look at how to ping one host, on my BSD-ish system:
sub ping_a_host { my $host = shift; `ping -i 1 -c 1 $host 2>/dev/null` =~ /0 packets rec/ ? 0 : 1; }
Here, I'm firing up a subshell to execute the ping -i 1 -c 1
command, which on my system requests ping have a 1-second timeout,
and selects (as Sean Connery's character said in The Hunt for Red
October so eloquently) ``one ping only''. Your ping parameters may
vary: check your manpage.
The output is scanned for the string 0 packets rec
, which if absent
means we got a good ping. So if the match is found, we return 0 (the
ping was bad), otherwise we'll return 1. The ping
command spits
out some diagnostics on standard error, which we'll toss using
Bourne-shell syntax.
Note that the value of $host
is not checked here for sanity. We
certainly wouldn't want to accept a random command-line parameter or
(gasp) a web form value here without some serious validation.
However, as we use this in our program, all of the values will be
internally generated, so we've got some degree of safety.
So to scan a particular subnet, looking for hosts that are alive, we would add to that subroutine something like:
print "ping $_ is ", ping_a_host($_), "\n" for map "10.0.1.$_", 1..254;
Now, this routine completes very quickly for hosts that are alive, but is slow-as-molasses for hosts that aren't present, because the TCP protocol demands that the host have a chance to respond.
So how can we speed that up? This is not a CPU-intensive loop:
practically the entire time is waiting for some remote host to
respond. We'll leave the ping_a_host
subroutine alone, because
that's not where we have a problem: it's doing its job as fast as it
can. What we need to do is to do more of them at a time.
One first approach is to fork a separate process for each host we want to ping. We'll then sit back in a wait loop. As each child process completes, we'll note its exit status, and when there are no more kids, we'll spit out a report.
So, first, we'll define the host list for the task:
my @hosts = map "10.0.1.$_", "001".."010";
The numbers here are padded to three digits so that they sort as strings in a numeric sequence, a cheap but effective trick. Note also that I'm only selecting the first 10 hosts this time. I'll explain that shortly.
Next, we'll want a hash to keep track of the kids:
my %pid_to_host;
The keys of this hash will be the child process ID (PID), and the value will be the corresponding host that the child is processing. Next, we'll want to loop over the host list, firing up a child for each:
for (@hosts) { if (my $pid = fork) { ## parent does... $pid_to_host{$pid} = $_; warn "$pid is processing $_\n"; } else { # child does ## child does... exit !ping_a_host($_); }
As each host is placed into $_
, we'll fork. The result of fork is
a child process running in parallel with the parent process. These
processes are distinguished only by the return value of fork
, which
is 0 in the child, but the child's PID in the parent. So, if we get
back a non-zero value, we're the parent, and we'll store the PID into
the hash, along with the host that particular child is processing. If
we're the child, then we'll call the ping_a_host
routine, and
arrange for our exit status to be good (0) if that routine gives a
thumbs up.
The warn
in the loop is merely for diagnostic purposes so that you
can see what's happening. In a production program, I'd certainly
remove that.
At the end of this loop, we'll have a number of processes. Far too many, in fact. For each host to check, we'll have two processes running: the shell forked by the backquotes, and the ping process itself. Perl has to fork a shell because I needed that child to have its standard error output redirected. If I could have gotten the redirection out of those backquotes somehow, we'd have only one child process per host, not two.
Launching 20 processes to check 10 hosts will start pushing us up against the typical per-user process limit. And now you can see why I didn't do all 254 hosts at once!
Now it's time to wait for the results. A simple ``wait'' loop will reap the children as fast as they complete their task. First, we'll declare a hash to hold the results:
my %host_result;
The key will be the host, and the value will be 1 if the child said it was pingable, otherwise 0.
while (keys %pid_to_host) { my $pid = wait; last if $pid < 0; my $host = delete $pid_to_host{$pid} or warn("Why did I see $pid ($?)\n"), next; warn "reaping $pid for $host\n"; $host_result{$host} = $? ? 0 : 1; }
As long as we've got kids (indicated by the ever decreasing size of
the %pid_to_host
hash), we'll wait for them. The child process ID
comes back from wait
, which we'll stick into $pid
. At this
point, the exit status of that particular child is in $?
. If the
return value of wait
is negative, then we don't have any more kids.
This is an unexpected result, which we could check later by noticing
that %pid_to_host
is not yet empty, or we could have simply died
here.
Next, we'll use the %pid_to_host
hash to map the PID into the host
for which it was processing. Again, we might have accidentally reaped
a completed child which wasn't one of ours, so defensive programming
requires checking for that. This won't happen unless other parts of
this program are also forking children somehow, but I'm a cautious
programmer most of the time.
Finally, we'll take the exit status in $?
, and map it into the
appropriate good/bad value for the result hash.
When this loop completes, we have no more kids performing tasks, and it's time to show the result:
for (sort keys %host_result) { print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n"; }
For each key of the result table, we'll say whether the result was good or bad.
Putting this all together makes a nice little demo of forking 20 kids to check 10 hosts, but it won't scale to 254 hosts, because that would require more process slots than we typically have (or want to use, actually). What we need to do is perform the forking gradually, so that we never have more than 20 kids at a time. One naive approach is to chunk the data into bite-size bits:
my @all_hosts = ...; my %host_results; while (my @hosts = splice @all_hosts, 0, 10) { ... process @hosts, adding into %host_results ... } ... show results ...
Here, most of the code above gets wrapped into an outer loop which
hands 10 hosts at a time to be processed, using splice
to peel them
off of the master list. While this strategy certainly solves the ``no
more than 10 at a time'' condition, each batch of 10 has to wait for
the slowest of the 10 to complete.
A better way would be to fork until we hit the limit of active children, then wait for any one child to finish before we need to fork again. First, we'll need to factor out ``waiting for a kid'' into a subroutine so we can call it in two different places: while forking a new task, and at the end to reap all the remaining children:
sub wait_for_a_kid { my $pid = wait; return 0 if $pid < 0; my $host = delete $pid_to_host{$pid} or warn("Why did I see $pid ($?)\n"), next; warn "reaping $pid for $host\n"; $host_result{$host} = $? ? 0 : 1; 1; }
Note that we're accessing %pid_to_host
and %host_result
directly
here, so those variables must be in scope before the subroutine
definition. The subroutine now returns 1 if a kid was reaped, and 0
otherwise. The final reap loop now becomes:
## final reap: 1 while wait_for_a_kid();
At this point, the program functions identically to the prior one,
except that I've refactored the kid reaping. The magic happens next.
We'll put wait_for_a_kid
in the middle of the forking loop as well,
just before we're about to fork, conditionally if the number of kids is
already at the maximum we chose:
for (@hosts) { wait_for_a_kid() if keys %pid_to_host >= 10; ...
Ahh. That does it. We can now crank @hosts
back up to our 254
items. As we fire off the first 10, this new statement has no effect.
But when it comes time for the 11th, we'll wait until at least one of
the other 10 to complete first. So, at no time do we have more than
10 hosts active (using 20 child processes for reasons explained
earlier). The entire program is given here in case you want to see
it all in context:
sub ping_a_host { my $host = shift; `ping -i 1 -c 1 $host 2>/dev/null` =~ /0 packets rec/ ? 0 : 1; }
my %pid_to_host; my %host_result;
sub wait_for_a_kid { my $pid = wait; return 0 if $pid < 0; my $host = delete $pid_to_host{$pid} or warn("Why did I see $pid ($?)\n"), next; warn "reaping $pid for $host\n"; $host_result{$host} = $? ? 0 : 1; 1; }
my @hosts = map "10.0.1.$_", "001".."254";
for (@hosts) { wait_for_a_kid() if keys %pid_to_host > 10; if (my $pid = fork) { ## parent does... $pid_to_host{$pid} = $_; warn "$pid is processing $_\n"; } else { # child does ## child does... exit !ping_a_host($_); } }
## final reap: 1 while wait_for_a_kid();
for (sort keys %host_result) { print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n"; }
As a working program, this does pretty good, although it could be made
a bit more robust, and is very specific to the particular ping
program on my machine. If you don't want to write this pattern of
code into each program that wants to do parallel things, look at
Parallel::ForkManager
in the CPAN, which does pretty much the same
thing with a friendly interface.
One improvement to this program might be to pre-fork and re-use the children, using some sort of IPC (pipes or sockets) to communicate additional tasks to perform as each task completes, but I've run out of space to talk about that here. Until next time, enjoy!