Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 17 (November 1997)

In the past few months, I've watched with excitement at the wonderful images that the Mars Pathfinder project has been giving us. Even more interesting is the tie-in with Perl about this project: Larry Wall spent a few years at JPL (the Pathfinder project coordinators), and much of the early work on Perl was therefore developed there. You can probably bet that a lot of these images are being manuvered, manipulated, and manifest by Perl programs.

In honor of this occasion, I started thinking ``pathfinder'', ``pathfinder'', and came up with a typical task that relates to another kind of path, your shell command path. (In fact, while I was testing this program, I called it ``pathfinder'', which seems to fit.)

If you're like me, you add stuff to your path every time someone tells you that ``oh, put this directory in your path to get to these tools''. The trouble with that is that the search path is linear. If a program named ``clipper'' exists in a directory early in the list, a later ``clipper'' will not be seen, because of the shell's ``first come, first invoked'' policy.

But how can you tell if you're hiding programs this way? The shell won't tell you. But a Perl program can!

Let's take a look at the task of noting duplicates in the PATH. First, we need to grab the elements of the path itself:

    my @path = split /:/, $ENV{PATH};

Here, the PATH environment variable is accessed via the special %ENV hash, and then split on colons. The result is a list in @path. If there were any null elements in the PATH (meaning that the current directory is needed), those will end up as empty strings.

Next, we should remove any duplicate directories from this path. How do duplicates get in there? The most common thing is that you have something like:

    set path = (/strange/tool/bin $path);

in some configuration file. But you can also get duplicates when you copy one good .cshrc or .profile to another system. For example, my favorite .cshrc has both /bin and /usr/bin in the path, but on some systems, this is actually the same directory thanks to a symlink.

So, we can't just look at the name of the directory -- we need to look at the actual directory. Fortunately, a stat() call can give us the device/inode value, which can uniquely identify each entry. Here's the start of that:

    for (@path) {
      ...
    }

Now, within the body of this loop, $_ is one of the directories from the original path. We must first reject any entry that is not an absolute path. That's easy:

      next unless m#^/#;

Here, the regular expression ^/ queries each element (in $_) to see if it begins with a /. If you're more comfortable with substr(), you can do something like this instead:

      next unless substr($_,0,1) eq "/";

But that looks like more typing to me. I like the regular expression version better.

Next, the device and inode number are fetched via stat(). We'll need that to see if two names point to the same directory. The device and inode number uniquely identify each entry in the Unix filesystem. If stat() shows them to be the same, then they really are the same. That looks like this:

      my ($dev,$ino) = stat;
      next unless defined $dev;

If the stat returns an empty list, the original entry in $_ doesn't exist (or can't be reached). In that case $dev is undef, so we bail. The stat call actually returns a 13 element list, and we're ignoring all but the first two elements.

From there, we construct a string $key with a space between the two numbers. The actual format is not important... we just need any pair of numbers to look different from any other pair of numbers:

      my $key = "$dev $ino";

This key is next used as a key into the %path_inodes hash, looking for a duplicate. If it already exists, we've seen this particular directory in this path, and we report it as such. This is handled with the exists() function, which returns true if the indicated key is found in the hash.

      if (exists $path_inodes{$key}) {
        print "warning: $_ is linked to $path_inodes{$key}\n";
        next;
      }

Here, the next causes remaining processing to be finished on this particular directory, since we already know it's a duplicate.

If not, we add it to @clean_path, and also note it in the %path_inodes hash. The key is the $key we just constructed, and the value is the original path entry. We never use this, but it was nice during debugging.

      $path_inodes{$key} = $_;
      push @clean_path, $_;

The result then is that @clean_path will be the path, without any duplicates. This is important because if there were any duplicates in the path, there would also be a huge number of false hits in the next step. Now, it's time to preen the duplicate names in that path.

One way to do this is to walk this path with another foreach loop:

    for my $dir (@clean_path) {
      ...
    }

Inside the loop, we need to get the contents of the indicated directory, $dir. There are a few ways to do that, but let's use a DirHandle for fun. To use that, we need to pull in the right module:

      use DirHandle;

Next, a new DirHandle is created on the target directory, and read as a list, like so:

      my @files =
        DirHandle->new($dir)->read;

Here, the result of calling new on DirHandle is immediately sent a read, causing it to dump out all if the names in that directory. What's cool about this is that the DirHandle is automatically closed at the end of this statement, because nothing saved it!

But that's not quite right... we don't want ``.'' or ``..'', and to make the report nice, it should be sorted. Easy enough -- just add a sort and grep operation:

      my @files =
        sort grep !/^\.\.?$/,
        DirHandle->new($dir)->read;

Here, only the elements that do not match the regex (which picks out only ``.'' and ``..'') are passed to sort.

Now, it's time to walk through the @files list and pick out ones that we've seen on some other directory. It's similar to the duplicate detection in construction the @clean_path above:

      for my $file (@files) {
        if (exists $progs{$file}) {
          print "$file in $dir is shadowed by $progs{$file}\n";
          next;
        }
        $progs{$file} = $dir;
      }

Here, each filename is inserted one at a time into $file. If it already exists in the %progs hash, then we've seen it in a previous directory, and we report that. If not, then we note this particular program name and its directory for future passes.

So, that's the pieces of the code. But we also need to glue in a few more lines of code to make it work completely clean with use strict, and that results in the following code:

    #!/usr/bin/perl -w
    use strict;
    my @path = split /:/, $ENV{PATH};
    my %path_inodes;
    my @clean_path;
    for (@path) {
      next unless m#^/#;
      my ($dev,$ino) = stat;
      next unless defined $dev;
      my $key = "$dev $ino";
      if (exists $path_inodes{$key}) {
        print "warning: $_ is linked to $path_inodes{$key}\n";
        next;
      }
      $path_inodes{$key} = $_;
      push @clean_path, $_;
    }
    my %progs;
    ## print "clean path is @clean_path\n";
    for my $dir (@clean_path) {
      use DirHandle;
      my @files =
        sort grep !/^\.\.?$/,
        DirHandle->new($dir)->read;
      ## print "$dir: @files\n";
      for my $file (@files) {
        if (exists $progs{$file}) {
          print "$file in $dir is shadowed by $progs{$file}\n";
          next;
        }
        $progs{$file} = $dir;
      }
    }

Stick this somewhere in your path (as ``pathfinder'' if you wish), and invoke it, and you'll see all the programs you can never reach, because they are hidden. Enjoy.


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.