Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 44 (Jan 2003)

[suggested title: What Perl got right]

As I type this month's column, we're just pulling away from Ocho Rios, Jamaica, on the latest Geek Cruise (www.geekcruises.com) called ``Linux Lunacy 2''. Earlier today, some of the speakers on this conference/cruise, including Linus Torvalds and Eric Raymond, held a meeting with the Jamaican Linux Users Group. We're out at sea, en-route to Holland America's private island, ``Half Moon Cay'', so I'm using the satellite link to upload and review this column (for a mere 30 cents a minute).

Earlier this week Eric Raymond gave one of his many visionary presentations. This one in particular mentioned Perl for a section on ``What Perl Got Right''. The message surprised me, because Eric prefers that other popular ``P'' language over Perl for his personal and professional work. The one thing that Eric says that Perl got right is one of the many things that I think Perl got right: Perl's easy access to low-level operating system functionality.

Let's take a look at what this means. Perl gives you unlink() and rename() to remove and rename files. These calls pass nearly directly to the underlying ``section 2'' Unix system calls, without hiding the call behind a confusing abstraction layer. In fact, the name ``unlink'' is a direct reflection of that. Many beginners look for a ``file delete'' operation, without stumbling across ``unlink'' because of its peculiar name.

But the matchup doesn't stop there. Perl's file and directory operations include such entries as chdir(), chmod(), chown(), chroot(), fcntl(), ioctl(), link(), mkdir(), readlink(), rmdir(), stat(), symlink(), umask(), and utime(). All of these are mapped nearly directly to the corresponding system call. This means that file-manipulating programs don't have to call out to a shell just to perform the heavy lifting.

And if you want process control, Perl gives you alarm(), exec(), fork(), get/setpgrp(), getppid(), get/setpriority(), kill(), pipe(), sleep(), wait(), and waitpid(). With fork and pipe, you can create any feasible piping configuration, again not limited to a particular process abstraction provided by a more limited scripting language. And you can manage and modify those processes directly as well.

Let's not forget those socket functions, like accept(), bind(), connect(), getpeername(), getsockname(), get/setsockopt(), listen(), recv(), send(), shutdown(), socket(), and socketpair(). Although most people usually end up using the higher level modules that wrap around these calls (like LWP or Net::SMTP), they in turn can call these operations to set up the interprocess communication. And if a protocol isn't provided by a readily accessible library, you can get down near the metal and tweak to your heart's content.

Speaking of interprocess communication, you've also got the ``System V'' interprocess communications, like msgctl(), msgget(), msgrcv(), msgsnd(), semctl(), semget(), semop(), shmctl(), shmget(), shmread() and shmwrite(). Again, each of these calls maps nearly directly to the underlying system call, making existing C-based literature a ready source of examples and explanation, rather than providing a higher-level abstraction layer. Then again, if you don't want to deal with the low-level interfaces, common CPAN modules hide away the details if you wish.

And then there's the user and group info (getpwuid() and friends), network info (like gethostbyname()). Even opening a file can be modified using all of the flags directly available to the open system call, like O_NONBLOCK, O_CREAT or O_EXCL.

Hopefully, you can see from these lists that Perl provides a rich set of interfaces to low-level operating system details. Why is this ``what Perl got right''?

It means that while Perl provides a decent high-level language for text wrangling and object-oriented programming, we can still get ``down in the dirt'' to precisely control, create, modify, manage, and maintain our systems and data. For example, if our application requires a ``write to temp file, then close and rename atomically'' to keep other applications from seeing a partially written file, we can spell it out as if we were in a systems implementation language like C:

        open TMP, ">ourfile.$$" or die "...";
        print TMP @our_new_data;
        close TMP;
        chmod 0444, "ourfile.$$" or die "...";
        rename "ourfile.$$", "ourfile" or die "...";

By keeping the system call names the same (or similar), we can leverage off existing examples, documentation, and knowledge.

In a scripting language without these low-level operations, we're forced to accept a world as presented by the language designer, not the world in which we live as a practicality. Eric Raymond gave as examples an old LISP system which provided many layers of abstraction (sometimes buggy) before you got to actual file input/output system calls, and the classic Smalltalk image, which provides a world unto itself, but very few hooks out into the real world. As a modern example, Java seems to be somewhat painful about ``real world'' connections, preferring instead to have its users implement the ideal world for it rather than it adapting to its world.

And in this, I agree. I've personally written probably a thousand system admin utilities over the 13 years that I've been playing with Perl, and many of those involved those mundane tasks of opening a file precisely the way I wanted, moving it around, and watching processes and files to make sure they weren't getting out of hand. It may not be sexy, but it's where the work actually is -- where the work gets done.

So while I encourage everyone to rush out and play with Squeak Smalltalk (www.squeak.org) to learn real object-oriented programming, at the end of the day it's still gonna be Perl (OO or not) that monitors my website and pages me when the system goes down.

One interesting side-effect of Perl having so many low-level functions is that it forced those who ported Perl from Unix to other operating systems to think about how to perform those functions portably. Thus, the ``Unix API'' provides a ``virtual'' operating system interface for Perl programmers, regardless of the platform.

And since I'm familiar with Unix, I can actually code up portable Perl programs that run on MacOS and Windows and VMS without having to be very smart on their oddities, or relearn a different API, even for apparently low-level operations. I remember squealing with delight when a program I had written for Unix that dealt with forking and sockets ran without any code changes on a Windows box at a customer site. I actually had not expected it to work, especially not as-is.

But what if something in section 2 of my Unix manual isn't supported directly by Perl? Well, on those platforms that support it, the syscall() interface provides a nifty escape hatch. Given the right parameters, the syscall function can call nearly any single-value-return system call.

For example, suppose the rename() function weren't provided directly by Perl. We could simply look it up in /usr/include/sys/syscall.h, apply the proper parameters as indicated by the rename(2) page, and we're up and running anyway. The code might look something like this:

        sub my_rename {
                my $from = shift;
                my $to = shift;
                $! = 0;
                syscall(128, $from, $to);
                return ! $!;
        }

        my_rename("fred", "barney")
                or die "Cannot rename: $!";

The magic ``128'' came from hunting around in my /usr/include directory until I could find the system call number of rename. That's the highly non-portable part of this operation, so your mileage and number will vary.

Once we have that number, we can issue a syscall. The value of $! is set to 0 before the call, and checked for a non-zero value after the call. If the operator returned anything of interest, we could also check that at the call itself. If the call fails, the normal die with $! in the text string gives us a reasonable error message.

So, if syscall works, we can wrap anything in Unix manual section 2 that isn't already provided, all without leaving Perl.

But what if syscall didn't work? Well, even all the way back to Perl version 4, we had a documented way of ``extending'' a Perl interpreter using the C-level Perl interfaces. And it all got nicely easier with the release of Perl version 5, using the XS interface. With XS, we can write dynamically loaded object code for our low-level interface (or statically linked on some of the more limited systems), and then use it at will.

But this XS interface was still a stumbling block for many people. Many consider it arcane, requiring too many knowledge steps to be useful. So, thankfully, last year Brian Ingerson (``ingy'') came along and wrote the beginnings of the Inline architecture. In particular, Inline::C allows me to define arbitrary subroutines in C, and they simply appear as callable Perl subroutines. Behind the scenes, an MD5-hash of the C code is created, and used to maintain a cache of to-be-compiled or pre-compiled loadable object files. At this point, renaming a file would be as simple as copying the syntax nearly directly from the example of the rename(2) manpage:

    use Inline C => <<'END';

    #include <stdio.h>

    int my_rename(char *from, char *to) {
      return rename(from, to) >= 0; /* -1 is bad, 0 is good */
    }

END

    my_rename("fred", "barney")
      or die "Cannot rename fred to barney: $!";

Here I'm providing the definition for my_rename as a C function. The arguments are specified exactly as they would be in a C program, and the rename system call gets called in the middle, massaging the return value slightly.

The Inline structure creates the proper glue to hook the snippet into the Perl-to-C code, and arranges for the C compiler to process that code. The results are cached: the first time this program is run, it takes about a second or so, but every invocation following that is lightning fast.

So, as you can see, Perl can easily get ``down to C level'' (just like this cruise ship I'm on). And Eric Raymond says this is the one thing that Perl got right. I tend to think it's a bit more than that. By the way, if you want to hack Perl with experts, be sure to check out the upcoming Perl Geek Cruise on the web site. I'll be there, coding on the high seas. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 44 (Jan 2003)