Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 61 (Nov 2005)

[suggested title: ``Common network protocols'']

When Larry Wall added the socket function to Perl version 3 over a dozen years ago, we Perl programmers rejoiced, because we no longer needed to write separate C programs to connect over the (then tiny) network. We could write programs that could send mail, or fetch news articles! Little did anyone foresee the importance that TCP/IP would play in the lives of everyone within a few years. Let's take a look at what Perl has learned to do in the past decade.

For example, in [this column in July 2003], I illustrated an easy way to connect to a socket (in this case, the daytime service at the NIST) and dump out the entire response:

  use IO::Socket::INET;
  my $client = IO::Socket::INET->new('time.nist.gov:13')
    or die "new: $@";
  print for <$client>;

In this case, there's no interactive protocol. We connect, take everything handed to us, and dump it out. But nearly always, the net protocols require an interaction, such as providing authentication credentials, selecting the various tasks, and providing needed data.

Higher level protocols for clients

Luckily, Perl has been the ``duct tape of the Internet'' for long enough that nearly every core net protocol already has a module in the CPAN providing some level of support. In nearly all cases, these modules are named with a Net:: prefix, followed immediately by the traditional name for the protocol (like SMTP or DNS). This means that we don't have to spend time ``reinventing the wheel'' to talk the protocol. We can concentrate on our application instead.

The protocols used on the Internet are worked out with careful design and discussion. Too often, I've seen a network programming newbie rush in and try to invent their own protocol. Apparently, they don't realize the pitfalls. A protocol has to clearly specify who is talking, and when, defining all possible state transitions. A protocol also must include specifications for data, such as how to escape that odd data that might be confused for delimiters or special operations. A protocol that uses UDP also has to understand how to deal with MTUs (the maximum size for a UDP packet on a given link, which varies with the network topology) and network delays, drops, and resequencing.

Luckily, the Net:: modules hide all of this from us. We simply use the provided API, and relax, knowing that the module author has followed the specification accurately. Hopefully.

For example, to send mail to a server, we can use something like:

  use Net::SMTP;
  my $smtp = Net::SMTP->new('localhost');
  $smtp->mail('merlyn@example.com');
  $smtp->to('postmaster@example.com');
  $smtp->data([split /\n/, <<'END_OF_MESSAGE']);
  To: postmaster@example.com
  This message was sent using Net::SMTP!
  END_OF_MESSAGE
  $smtp->quit;

Here, we've connected to the local SMTP server on our box, and established a sender and a recipient. This is followed by the data payload of the message (from a here document wrapped into an arrayref), and finally a disconnect.

Although the exact text to be sent to the SMTP server is not much more difficult than the program as you see it here, the method calls are automatically taking care of things like timeouts and escaping. For example, a mail-message line consisting of a single dot needs to be escaped as a doubled-up dot before sending, because a single dot indicates the end of the message. The data method handles that for us automatically, so it just does the right thing.

Because SMTP is such a common protocol, many layers and extensions have been added, such as MIME mail for attachments and alternate representations (plain text vs rich text). The MIME::Lite module can create such messages, but then uses Net::SMTP (or the sendmail command) to push the message on its way. Again, don't invent this stuff yourself: modules are already in place to do the work.

For fetching messages from a server, the Net::IMAP and Net::POP3 modules provide the low-level functions needed to fetch (and in some cases, send) email via the corresponding mail repository protocol.

Another common net protocol is ``the web'', usually in the form of HTTP or HTTPS requests. The LWP module suite (found in the CPAN) provides a very versatile ``virtual browser'' (using LWP::UserAgent), including support for cookies and HTML form parsing. Web page can be accessed as easily as:

    use LWP::Simple qw(get);
    my $content = get "http://www.stonehenge.com/perltraining/";;
    print $content if defined $content;

Although Usenet isn't what it once was, the NNTP protocol provides a reliable means to share and transfer messages efficiently. The Net::NNTP module (and the independently implemented News::NNTPClient module) provide client-side access to an NNTP server. For example, to get all of the rec.humor.funny postings, it's as simple as:

     use Net::NNTP;
     my $nntp = Net::NNTP->new("nntp.example.com");
     my ($num, $low, $high) = $nntp->group("rec.humor.funny");
     for my $n ($low..$high) {
       my $art = $nntp->article($n) or next;
       print for @$art;
     }

Another common protocol that's been around for a generation of Internet users is the File Transfer Protocol (FTP), which can be accessed with the Net::FTP module. Entire hierarchies can be transferred with Net::FTP::Recursive.

Although Net::Telnet can be used to connect to the Telnet servers, most experts agree that SSH is the right thing to use these days, supported by Net::SSH (for connections) and Net::SCP and Net::SFTP (for file transfer).

For DNS queries of all types (and dynamic DNS updates), Net::DNS provides a nice pure-Perl interface. For example, to get the MX records for stonehenge.com, we simply execute:

  use Net::DNS;
  for my $mx (mx("stonehenge.com")) {
    printf "%5d %s\n", $mx->preference, $mx->exchange;
  }

which prints (at the moment):

      5 blue.stonehenge.com
    666 spamtrap.stonehenge.com

Ahh yes, my wonderful high-MX spamtrap. But that's for another article.

There are also clients for SOAP (SOAP::Lite), and Jabber, Whois, Ping, LDAP, BEEP, CUPS, Ident, Gopher, NTP, and SNMP (named with the Net:: prefix) in the CPAN. In general, if you're looking things up in an RFC, you're probably wasting time unless you've already verified that the CPAN doesn't have it yet!

Beyond the core RFC'ed Internet Protocols, CPAN modules also provide connections to X11 servers (for graphical interfaces), AOL Instant Messenger, Yahoo!Messenger, IRC, Rendevous (mDNS or Bonjour) and Traceroute. There's also interfaces to the Google and Amazon APIs for complex queries.

Servers

Most of the CPAN packages permit a Perl application to be a client to connect to an available server (which typically isn't written in Perl). However, the CPAN also includes server packages for such things as SMTP (Mail), FTP, TFTP, DNS, HTTP, and IDENT. With these packages, we can construct arbitrary servers that can be used with traditional (or Perl-based) clients.

These servers can perform traditional functions, perhaps with unusual characteristics. They might be also be filters or proxies, connecting in turn using the same protocol to the next layer of servers. Or, they might even be gateways between protocols, turning a mail message into a news posting, or a DNS lookup into a web request.

Complex servers

When a server is dealing with multiple clients, or is acting as both a client and a server, our program can run into situations where it must do more than one thing at a time. For example, the program might need to wait for input to arrive on any of four input handles, and compute some calculation in the meanwhile.

Traditionally, this is handled by having the process fork. However, in many cases, this only reduces the problem somewhat, as ultimately some part of the code must face two or more possible events simultaneously, because the various pieces that are forked must communicate.

On most modern operating systems, a process can request to be notified if any one of a set of handles becomes ready, possibly with a timeout to limit the waiting time. The Perl select function can handle this at a low level, but the IO::Select module (included with the Perl distribution) gives us a higher easier-to-use view.

Once we bring signals into the mix, an additional level of difficulty can result (because of Perl's lack of safe signals), best handled by the Event module (in the CPAN), which can safely capture and deliver signal events at reasonable interruption points in the code.

An additional level of complexity is introduced when we add client-related operations. For example, if we had an IRC bot that also performed web operations, there might be times when we're in the middle of retrieving a web page (which takes time), but be called upon to service a timely IRC message or ping request. And if we also add a Curses screen-based interface, we might need to grab a line of text from the screen while doing everything else.

The best of breed for these kinds of tasks is the POE framework, which consists of the POE kernel plus a lot of plugins for various protocols. Using POE requires understanding event-driven programming (``when this happens, do that'') as well as possibly breaking up our computationally complex tasks into a series of events to insure apparent simultaneity.

Lower-level access

Modern TCP/IP stacks provide both traditional socket interfaces (used by the modules already discussed), as well as a raw packet interface. In the raw packet interface, we can construct an arbitrary packet to be placed ``on the wire'', or pick an arbitrary packet ``off the wire''.

The most common use of the raw packet interface is to capture the entire packet for diagnostic purposes. For example, tools like tcpdump and ethereal grab each packet in its raw form to analyze the sender, receiver, and payload in much more detail than is normally provided.

The most flexible package for packet inspection is Net::Pcap, which works with the freely available libpcap to set up inspection access and filters, and pull apart the resulting information. Net::Packet can be used to understand and create raw packets as well.

Well, I hope I've illustrated that Perl is still the reigning champion ``duct tape of the Internet''. Until next time, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.