Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 27 (Aug 1999)
Symbolic links were not present in the first version of Unix that I used. That would be Unix V6, back in 1977, when the Unix kernel size was under 32K. It's hard to imagine anything being under 32K associated with Unix these days.
But somewhere in the bowels of the University of California at Berkeley, in the early 80s, the boys working on BSD concocted a scheme to rectify two of the biggest problems with hard links: they couldn't be made to a directory, and they didn't want to point to another mounted filesystems. And their solution was that now common feature, a symbolic link.
A symbolic link is essentially a text string that sits in place of a file. When the symbolic link's filename is accessed, the Unix kernel replaces the filename with its text value instead, like a macro expansion. This all happens transparently to the executing program (unlike some other popular operating systems).
From the shell, symbolic links are easy enough to create:
ln -s /usr/lib/perl5 ./Lib
which makes a reference to Lib
in the current directory hop over
to /usr/lib/perl5/
. From Perl, this same step is:
symlink("/usr/lib/perl5", "./Lib") or die "$!";
And we can see this is so with:
ls -l
which will show something like:
..... Lib -> /usr/lib/perl5
indicating this redirection is going on. And that same fact is apparent to Perl like so:
my $where = readlink("Lib"); print "Lib => $where\n";
But what if /usr/lib
itself is also a symbolic link, say to
/lib
? Well, the system nicely picks that up when it's looking down
the steps from /usr
to /usr/lib
, and redirects that to /lib
,
and continues from there to look for perl5
.
Thus, following a symlink may involve multiple expansions. There's a limit to the number of expansions in a path to prevent runaway loops, but generally it's enough that you won't worry about it.
What's the easiest way to really know where the symlink ends up then?
Well, you could keep typing a lot of ls -l
invocations, and take
careful notes, or just write a Perl program to do the expansion for
you.
And while were at it, let's also make this work recursively from a starting directory in a filetree, dumping out all the symlinks and their ultimate expansions in all directories contained within. Cool.
So, here's a program that does just that, presented a few lines at a time.
#!/usr/bin/perl -w use strict; $|++;
These first three lines tell us where to find Perl, and enable
warnings and the usually good compiler restrictions. We'll also
disable buffering on STDOUT
, so I can see how far the program has
gotten during a long run.
use File::Find; use Cwd;
Next, we'll pull in two modules from the standard Perl distribution
library. File::Find
helps us recurse through a directory hierarchy
without thinking too hard about it, and Cwd
gets the current
working directory, usually without forking off a child process.
my $dir = cwd;
Now, we'll get the current directory via cwd
(imported from
Cwd
). We'll need this to properly expand relative names into
absolute names.
find sub { ##### contents here presented below }, @ARGV;
Next, the outer part of the body of the program. We'll call find
(imported from the File::Find
module), passing it an anonymous
subroutine reference, and the command-line argument array @ARGV
.
The subroutine (whose contents are defined below) will be called
for each file or directory in all directories and subdirectories
starting at the top-level directories named in @ARGV
.
Now for the guts of the subroutine. In the real program, these are
really located where #####
is marked above.
return unless -l;
When this subroutine is called, $_
is set to the name of the file
or directory of interest, and the current directory is set to the
directory that contains this item. Here, we'll end up returning if
the item is not a symbolic link.
The next two lines set up the core of the routine. I'm gonna have a
@left
and an @right
variable. Think of @left
as ``where in
the filetree am I at so far?'' and @right
as ``where else am I being
told to go?''. The basic task is to take one element at at time from
the front of @right
, and try to glue it onto the end of @left
,
until we have no more @right
to go. If at any step, the path of
@left
is a symlink however, we'll have to expand it and start
again. Also, if the element being examined from @right
is a dot or
dot-dot, we'll need to back up on @left
instead.
my @right = split /\//, $File::Find::name;
The variable $File::Find::name
has the full pathname starting from
the kind of name we gave on the command line. If that was a relative
name, this will also be a relative name to the original working
directory (now saved in $dir
). Here, I'm splitting the name apart
into individual elements.
my @left = do { @right && ($right[0] eq "") ? shift @right : # quick way split /\//, $dir; }; # first element always null
This is a bit more complicated, so I'll take it slowly. We're setting
up @left
to be the value of this expression coming from a do
block. If the first element of @right
is empty, then the original
string began with a slash, and we need to be relative to the root
directory. That's handled by moving that empty element from the
beginning of @right
to become the only element of @left
.
Otherwise, we had a relative name, and we'll preload @left
with a
split-apart version of the initial working directory.
while (@right) {
Now, as long as we have items to keep walking, we'll do this...
my $item = shift @right; next if $item eq "." or $item eq "";
This grabs the next step, and discards it if it's just an empty string or a single dot, meaning that we would have stayed at the current directory.
if ($item eq "..") { pop @left if @left > 1; next; }
And if it's dot-dot, we'll have to pop up a level on our current position (unless it would have us back up over the top).
my $link = readlink (join "/", @left, $item);
Now, if the path of @left
, together with the next step, form a
symbolic link, the value of $link
will be defined to be what we
need to replace $item
with. Otherwise, we can just slide along.
if (defined $link) { my @parts = split /\//, $link; if (@parts && ($parts[0] eq "")) { # absolute @left = shift @parts; # quick way } unshift @right, @parts; next;
So, if it's a symbolic link, we'll split it apart. If it's absolute,
@left
gets reset to the top. Otherwise, @left
stays as is.
We'll also push whatever we got in front of the remainder of
@right
, as it will influence the interpretation of that remaining
path.
} else { push @left, $item; next; }
If it wasn't a symbolic link at this step, it's simple; we just move
along to that point in @left
.
} print "$File::Find::name is ", join("/", @left), "\n";
When the loop is over, we'll dump out the resulting path of @left
.
And there you have it. It's a bit tricky, since the macro expansion of a symbolic link is somewhat recursive, but Perl's data structures and full access to the right system calls give us a straightforward way of interpreting symbolic links.
Now you'll never have to wonder where those links point again. Until next time, enjoy!