Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 61 (Nov 2005)
[suggested title: ``Common network protocols'']
When Larry Wall added the socket function to Perl version 3 over a dozen years ago, we Perl programmers rejoiced, because we no longer needed to write separate C programs to connect over the (then tiny) network. We could write programs that could send mail, or fetch news articles! Little did anyone foresee the importance that TCP/IP would play in the lives of everyone within a few years. Let's take a look at what Perl has learned to do in the past decade.
For example, in [this column in July 2003], I illustrated an easy
way to connect to a socket (in this case, the daytime
service
at the NIST) and dump out the entire response:
use IO::Socket::INET; my $client = IO::Socket::INET->new('time.nist.gov:13') or die "new: $@"; print for <$client>;
In this case, there's no interactive protocol. We connect, take everything handed to us, and dump it out. But nearly always, the net protocols require an interaction, such as providing authentication credentials, selecting the various tasks, and providing needed data.
Higher level protocols for clients
Luckily, Perl has been the ``duct tape of the Internet'' for long enough
that nearly every core net protocol already has a module in the CPAN
providing some level of support. In nearly all cases, these modules
are named with a Net::
prefix, followed immediately by the
traditional name for the protocol (like SMTP or DNS). This means that
we don't have to spend time ``reinventing the wheel'' to talk the
protocol. We can concentrate on our application instead.
The protocols used on the Internet are worked out with careful design and discussion. Too often, I've seen a network programming newbie rush in and try to invent their own protocol. Apparently, they don't realize the pitfalls. A protocol has to clearly specify who is talking, and when, defining all possible state transitions. A protocol also must include specifications for data, such as how to escape that odd data that might be confused for delimiters or special operations. A protocol that uses UDP also has to understand how to deal with MTUs (the maximum size for a UDP packet on a given link, which varies with the network topology) and network delays, drops, and resequencing.
Luckily, the Net::
modules hide all of this from us. We simply use
the provided API, and relax, knowing that the module author has
followed the specification accurately. Hopefully.
For example, to send mail to a server, we can use something like:
use Net::SMTP; my $smtp = Net::SMTP->new('localhost'); $smtp->mail('merlyn@example.com'); $smtp->to('postmaster@example.com'); $smtp->data([split /\n/, <<'END_OF_MESSAGE']); To: postmaster@example.com
This message was sent using Net::SMTP! END_OF_MESSAGE $smtp->quit;
Here, we've connected to the local SMTP server on our box, and established a sender and a recipient. This is followed by the data payload of the message (from a here document wrapped into an arrayref), and finally a disconnect.
Although the exact text to be sent to the SMTP server is not much more
difficult than the program as you see it here, the method calls are
automatically taking care of things like timeouts and escaping. For
example, a mail-message line consisting of a single dot needs to be
escaped as a doubled-up dot before sending, because a single dot
indicates the end of the message. The data
method handles that for
us automatically, so it just does the right thing.
Because SMTP is such a common protocol, many layers and extensions
have been added, such as MIME mail for attachments and alternate
representations (plain text vs rich text). The MIME::Lite
module
can create such messages, but then uses Net::SMTP
(or the
sendmail
command) to push the message on its way. Again, don't
invent this stuff yourself: modules are already in place to do the
work.
For fetching messages from a server, the Net::IMAP
and Net::POP3
modules provide the low-level functions needed to fetch (and in some
cases, send) email via the corresponding mail repository protocol.
Another common net protocol is ``the web'', usually in the form of HTTP
or HTTPS requests. The LWP
module suite (found in the CPAN)
provides a very versatile ``virtual browser'' (using LWP::UserAgent
),
including support for cookies and HTML form parsing. Web page
can be accessed as easily as:
use LWP::Simple qw(get); my $content = get "http://www.stonehenge.com/perltraining/"; print $content if defined $content;
Although Usenet isn't what it once was, the NNTP protocol provides a
reliable means to share and transfer messages efficiently. The
Net::NNTP
module (and the independently implemented
News::NNTPClient
module) provide client-side access to an NNTP
server. For example, to get all of the rec.humor.funny
postings,
it's as simple as:
use Net::NNTP; my $nntp = Net::NNTP->new("nntp.example.com"); my ($num, $low, $high) = $nntp->group("rec.humor.funny"); for my $n ($low..$high) { my $art = $nntp->article($n) or next; print for @$art; }
Another common protocol that's been around for a generation of
Internet users is the File Transfer Protocol (FTP), which can be
accessed with the Net::FTP
module. Entire hierarchies can be
transferred with Net::FTP::Recursive
.
Although Net::Telnet
can be used to connect to the Telnet servers,
most experts agree that SSH is the right thing to use these days,
supported by Net::SSH
(for connections) and Net::SCP
and
Net::SFTP
(for file transfer).
For DNS queries of all types (and dynamic DNS updates), Net::DNS
provides a nice pure-Perl interface. For example, to get the MX records
for stonehenge.com
, we simply execute:
use Net::DNS; for my $mx (mx("stonehenge.com")) { printf "%5d %s\n", $mx->preference, $mx->exchange; }
which prints (at the moment):
5 blue.stonehenge.com 666 spamtrap.stonehenge.com
Ahh yes, my wonderful high-MX spamtrap. But that's for another article.
There are also clients for SOAP (SOAP::Lite
), and Jabber, Whois,
Ping, LDAP, BEEP, CUPS, Ident, Gopher, NTP, and SNMP (named with the
Net::
prefix) in the CPAN. In general, if you're looking things up
in an RFC, you're probably wasting time unless you've already verified
that the CPAN doesn't have it yet!
Beyond the core RFC'ed Internet Protocols, CPAN modules also provide connections to X11 servers (for graphical interfaces), AOL Instant Messenger, Yahoo!Messenger, IRC, Rendevous (mDNS or Bonjour) and Traceroute. There's also interfaces to the Google and Amazon APIs for complex queries.
Servers
Most of the CPAN packages permit a Perl application to be a client to connect to an available server (which typically isn't written in Perl). However, the CPAN also includes server packages for such things as SMTP (Mail), FTP, TFTP, DNS, HTTP, and IDENT. With these packages, we can construct arbitrary servers that can be used with traditional (or Perl-based) clients.
These servers can perform traditional functions, perhaps with unusual characteristics. They might be also be filters or proxies, connecting in turn using the same protocol to the next layer of servers. Or, they might even be gateways between protocols, turning a mail message into a news posting, or a DNS lookup into a web request.
Complex servers
When a server is dealing with multiple clients, or is acting as both a client and a server, our program can run into situations where it must do more than one thing at a time. For example, the program might need to wait for input to arrive on any of four input handles, and compute some calculation in the meanwhile.
Traditionally, this is handled by having the process fork. However, in many cases, this only reduces the problem somewhat, as ultimately some part of the code must face two or more possible events simultaneously, because the various pieces that are forked must communicate.
On most modern operating systems, a process can request to be notified
if any one of a set of handles becomes ready, possibly with a timeout
to limit the waiting time. The Perl select function can handle
this at a low level, but the IO::Select
module (included with the
Perl distribution) gives us a higher easier-to-use view.
Once we bring signals into the mix, an additional level of difficulty
can result (because of Perl's lack of safe signals), best handled by
the Event
module (in the CPAN), which can safely capture and
deliver signal events at reasonable interruption points in the code.
An additional level of complexity is introduced when we add
client-related operations. For example, if we had an IRC bot that
also performed web operations, there might be times when we're in the
middle of retrieving a web page (which takes time), but be called upon
to service a timely IRC message or ping request. And if we also add a
Curses
screen-based interface, we might need to grab a line of text
from the screen while doing everything else.
The best of breed for these kinds of tasks is the POE
framework,
which consists of the POE kernel plus a lot of plugins for various
protocols. Using POE requires understanding event-driven programming
(``when this happens, do that'') as well as possibly breaking up our
computationally complex tasks into a series of events to insure
apparent simultaneity.
Lower-level access
Modern TCP/IP stacks provide both traditional socket interfaces (used by the modules already discussed), as well as a raw packet interface. In the raw packet interface, we can construct an arbitrary packet to be placed ``on the wire'', or pick an arbitrary packet ``off the wire''.
The most common use of the raw packet interface is to capture the
entire packet for diagnostic purposes. For example, tools like
tcpdump
and ethereal
grab each packet in its raw form to analyze
the sender, receiver, and payload in much more detail than is normally
provided.
The most flexible package for packet inspection is Net::Pcap
, which
works with the freely available libpcap
to set up inspection access
and filters, and pull apart the resulting information. Net::Packet
can be used to understand and create raw packets as well.
Well, I hope I've illustrated that Perl is still the reigning champion ``duct tape of the Internet''. Until next time, enjoy!