Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 41 (May 2002)

[suggested title: ``Doing many things, like pings'']

As a Unix system administrator, I'm often faced with those little mundane tasks that seem so trivial to me but so important to the community I'm supporting. Little things like ``hey, is that host up and responding to pings?''. Such tasks generally have a very repetitive nature to them, and scripting them seems to be the only way to have time to concentrate on the tasks that really need my attention.

Let's look at the specific task of pinging a number of hosts on a subnet. Now, there are tools to do this quickly (like nmap), and there are even Perl modules to perform the ping (as in Net::Ping), but I wanted to focus on something familar that can be launched from Perl as an external process, and the system ping command seems mighty appropriate for that.

First, let's look at how to ping one host, on my BSD-ish system:

    sub ping_a_host {
      my $host = shift;
      `ping -i 1 -c 1 $host 2>/dev/null` =~ /0 packets rec/ ? 0 : 1;
    }

Here, I'm firing up a subshell to execute the ping -i 1 -c 1 command, which on my system requests ping have a 1-second timeout, and selects (as Sean Connery's character said in The Hunt for Red October so eloquently) ``one ping only''. Your ping parameters may vary: check your manpage.

The output is scanned for the string 0 packets rec, which if absent means we got a good ping. So if the match is found, we return 0 (the ping was bad), otherwise we'll return 1. The ping command spits out some diagnostics on standard error, which we'll toss using Bourne-shell syntax.

Note that the value of $host is not checked here for sanity. We certainly wouldn't want to accept a random command-line parameter or (gasp) a web form value here without some serious validation. However, as we use this in our program, all of the values will be internally generated, so we've got some degree of safety.

So to scan a particular subnet, looking for hosts that are alive, we would add to that subroutine something like:

    print "ping $_ is ", ping_a_host($_), "\n"
      for map "10.0.1.$_", 1..254;

Now, this routine completes very quickly for hosts that are alive, but is slow-as-molasses for hosts that aren't present, because the TCP protocol demands that the host have a chance to respond.

So how can we speed that up? This is not a CPU-intensive loop: practically the entire time is waiting for some remote host to respond. We'll leave the ping_a_host subroutine alone, because that's not where we have a problem: it's doing its job as fast as it can. What we need to do is to do more of them at a time.

One first approach is to fork a separate process for each host we want to ping. We'll then sit back in a wait loop. As each child process completes, we'll note its exit status, and when there are no more kids, we'll spit out a report.

So, first, we'll define the host list for the task:

    my @hosts = map "10.0.1.$_", "001".."010";

The numbers here are padded to three digits so that they sort as strings in a numeric sequence, a cheap but effective trick. Note also that I'm only selecting the first 10 hosts this time. I'll explain that shortly.

Next, we'll want a hash to keep track of the kids:

    my %pid_to_host;

The keys of this hash will be the child process ID (PID), and the value will be the corresponding host that the child is processing. Next, we'll want to loop over the host list, firing up a child for each:

    for (@hosts) {
      if (my $pid = fork) {
        ## parent does...
        $pid_to_host{$pid} = $_;
        warn "$pid is processing $_\n";
      } else { # child does
        ## child does...
        exit !ping_a_host($_);
      }

As each host is placed into $_, we'll fork. The result of fork is a child process running in parallel with the parent process. These processes are distinguished only by the return value of fork, which is 0 in the child, but the child's PID in the parent. So, if we get back a non-zero value, we're the parent, and we'll store the PID into the hash, along with the host that particular child is processing. If we're the child, then we'll call the ping_a_host routine, and arrange for our exit status to be good (0) if that routine gives a thumbs up.

The warn in the loop is merely for diagnostic purposes so that you can see what's happening. In a production program, I'd certainly remove that.

At the end of this loop, we'll have a number of processes. Far too many, in fact. For each host to check, we'll have two processes running: the shell forked by the backquotes, and the ping process itself. Perl has to fork a shell because I needed that child to have its standard error output redirected. If I could have gotten the redirection out of those backquotes somehow, we'd have only one child process per host, not two.

Launching 20 processes to check 10 hosts will start pushing us up against the typical per-user process limit. And now you can see why I didn't do all 254 hosts at once!

Now it's time to wait for the results. A simple ``wait'' loop will reap the children as fast as they complete their task. First, we'll declare a hash to hold the results:

    my %host_result;

The key will be the host, and the value will be 1 if the child said it was pingable, otherwise 0.

    while (keys %pid_to_host) {
      my $pid = wait;
      last if $pid < 0;
      my $host = delete $pid_to_host{$pid}
        or warn("Why did I see $pid ($?)\n"), next;
      warn "reaping $pid for $host\n";
      $host_result{$host} = $? ? 0 : 1;
    }

As long as we've got kids (indicated by the ever decreasing size of the %pid_to_host hash), we'll wait for them. The child process ID comes back from wait, which we'll stick into $pid. At this point, the exit status of that particular child is in $?. If the return value of wait is negative, then we don't have any more kids. This is an unexpected result, which we could check later by noticing that %pid_to_host is not yet empty, or we could have simply died here.

Next, we'll use the %pid_to_host hash to map the PID into the host for which it was processing. Again, we might have accidentally reaped a completed child which wasn't one of ours, so defensive programming requires checking for that. This won't happen unless other parts of this program are also forking children somehow, but I'm a cautious programmer most of the time.

Finally, we'll take the exit status in $?, and map it into the appropriate good/bad value for the result hash.

When this loop completes, we have no more kids performing tasks, and it's time to show the result:

    for (sort keys %host_result) {
      print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n";
    }

For each key of the result table, we'll say whether the result was good or bad.

Putting this all together makes a nice little demo of forking 20 kids to check 10 hosts, but it won't scale to 254 hosts, because that would require more process slots than we typically have (or want to use, actually). What we need to do is perform the forking gradually, so that we never have more than 20 kids at a time. One naive approach is to chunk the data into bite-size bits:

    my @all_hosts = ...;
    my %host_results;
    while (my @hosts = splice @all_hosts, 0, 10) {
      ... process @hosts, adding into %host_results ...
    }
    ... show results ...

Here, most of the code above gets wrapped into an outer loop which hands 10 hosts at a time to be processed, using splice to peel them off of the master list. While this strategy certainly solves the ``no more than 10 at a time'' condition, each batch of 10 has to wait for the slowest of the 10 to complete.

A better way would be to fork until we hit the limit of active children, then wait for any one child to finish before we need to fork again. First, we'll need to factor out ``waiting for a kid'' into a subroutine so we can call it in two different places: while forking a new task, and at the end to reap all the remaining children:

    sub wait_for_a_kid {
      my $pid = wait;
      return 0 if $pid < 0;
      my $host = delete $pid_to_host{$pid}
        or warn("Why did I see $pid ($?)\n"), next;
      warn "reaping $pid for $host\n";
      $host_result{$host} = $? ? 0 : 1;
      1;
    }

Note that we're accessing %pid_to_host and %host_result directly here, so those variables must be in scope before the subroutine definition. The subroutine now returns 1 if a kid was reaped, and 0 otherwise. The final reap loop now becomes:

    ## final reap:
    1 while wait_for_a_kid();

At this point, the program functions identically to the prior one, except that I've refactored the kid reaping. The magic happens next. We'll put wait_for_a_kid in the middle of the forking loop as well, just before we're about to fork, conditionally if the number of kids is already at the maximum we chose:

    for (@hosts) {
      wait_for_a_kid() if keys %pid_to_host >= 10;
      ...

Ahh. That does it. We can now crank @hosts back up to our 254 items. As we fire off the first 10, this new statement has no effect. But when it comes time for the 11th, we'll wait until at least one of the other 10 to complete first. So, at no time do we have more than 10 hosts active (using 20 child processes for reasons explained earlier). The entire program is given here in case you want to see it all in context:

    sub ping_a_host {
      my $host = shift;
      `ping -i 1 -c 1 $host 2>/dev/null` =~ /0 packets rec/ ? 0 : 1;
    }
    my %pid_to_host;
    my %host_result;
    sub wait_for_a_kid {
      my $pid = wait;
      return 0 if $pid < 0;
      my $host = delete $pid_to_host{$pid}
        or warn("Why did I see $pid ($?)\n"), next;
      warn "reaping $pid for $host\n";
      $host_result{$host} = $? ? 0 : 1;
      1;
    }
    my @hosts = map "10.0.1.$_", "001".."254";
    for (@hosts) {
      wait_for_a_kid() if keys %pid_to_host > 10;
      if (my $pid = fork) {
        ## parent does...
        $pid_to_host{$pid} = $_;
        warn "$pid is processing $_\n";
      } else { # child does
        ## child does...
        exit !ping_a_host($_);
      }
    }
    ## final reap:
    1 while wait_for_a_kid();
    for (sort keys %host_result) {
      print "$_ is ", ($host_result{$_} ? "good" : "bad"), "\n";
    }

As a working program, this does pretty good, although it could be made a bit more robust, and is very specific to the particular ping program on my machine. If you don't want to write this pattern of code into each program that wants to do parallel things, look at Parallel::ForkManager in the CPAN, which does pretty much the same thing with a friendly interface.

One improvement to this program might be to pre-fork and re-use the children, using some sort of IPC (pipes or sockets) to communicate additional tasks to perform as each task completes, but I've run out of space to talk about that here. Until next time, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.