Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 47 (Jul 2003)

[suggested title: ``The Simplicity of Sockets'']

The Internet connects people to other people and to services. These connections are made with applications talking to each other on separate machines, over various media such as Ethernet, dialup, and 802.11b wireless. These applications, in turn, are using a simple but powerful tool called a ``socket'' to provide the datastream between them. Let's look at sockets and how to describe and use them.

A socket is a connection from a process to another process. The two processes are typically on different machines, but they can also be on the same machine.

The most common use of a socket is to connect a client with a server, similar to placing a phone call to some phone service. The client creates a socket (picks up the phone), then connects it to the server (dials the phone, waits for an answer).

One the sockets are connected, communication is generally bidirectional, speaking using some agreed-upon protocol. You must have a protocol to know who is talking at any given time, and what it means. For example, the HyperText Transfer Protocol (HTTP) defines how to get a web page, including defining precisely what each side's role is in the conversation. Simple Mail Transfer Protocol (SMTP) defines the transmission of email, and so on. When the connection is finished, both sides hang up.

To Perl, a socket is presented as a filehandle, and is read and written using ordinary filehandle operations.

Just as a phone might be identified by its phone number, a socket is identified using its internet (IP) address and port number.

The address is generally unique to a particular machine (although a machine might have multiple addresses). In the current IPv4 scheme, an address has a dotted quad value, like 10.1.2.3. Each number within the address represents a byte value, and must therefore range between 0 and 255. Because addresses can be difficult to remember, or might change from time to time, we generally also assign names as aliases, such as www.stonehenge.com. (One of the functions of the DNS system is to map between names and addresses, but let's not get distracted.)

Associated with a particular address are a range of ports, numbered from 1 to 65535, although usually only a small number of these ports are ever in use at one time. Ports below 1000 are reserved for ``system processes''. Ports above 1023 are available for ``user processes''. (Yes, there's a gap, because people disagreed about the meaning of ``1K'' in some of the TCP/IP stack implementations.)

Many port numbers have predefined agreed-upon meanings, such as port 25 for email, port 119 for news, port 80 for web, and so on. These pre-defined port numbers are important for well-known services because knowing the port number is essential to establishing the connection. (A server port can be established using a transient port number, but then the port number has to be communicated by other means.)

All connections have both an address and a port number at both ends. (You're always calling from a phone to a phone.) Generally, you let the operating system pick a temporary port number for the client end of the connection.

The easiest way to create a simple client socket is with the IO::Socket::INET module, part of the libnet distribution in the CPAN. A sample socket creation looks like:

  use IO::Socket::INET;
  my $connection = IO::Socket::INET->new(@parameters)
    or die "Cannot connect: $@";

The @parameters will be described in a moment. Note that the error comes back in $@, not $!. (Some have argued that this is a misdesign, either on Perl's part or on the author's part, but we're stuck with the inconsistency either way.)

The resulting value in $connection acts like a filehandle: we can print to it to send data over the wire:

  print $connection "Some value\n";

Or read from it to accept the next data to be received:

  my $result = <$connection>;

By default, if data is not yet ready, the read will block the process, just like reading from an I/O device (like a terminal) or a pipe. For a typical connection, the data won't necessarily arrive in the same chunks as it was sent: you could print that first string as 11 separate characters, and yet the reciever will likely get just one full line instead.

Let's look at a simple connection to a known server. Many systems provide a ``daytime'' service. This service provides the time of day in a format similar to the output of the date command, and was originally used to help synchronize system clocks on various machines on a network. (These days, nearly all clock synchronization is performed with NTP, which permits much more detailed timing information to be accumulated and processed.)

To connect to the daytime service, we need two things: the IP address on which the service resides, and the port of the service. The port number is easy: by agreement, it's port 13 (although we don't need to know that because Perl can figure it out). The IP address is a bit trickier, because many modern Unix systems have turned off this service, as suggested by the security theory of ``don't run anything unecessary''.

When we connect to the daytime service, there's nothing to send. The server just notices the connection, and immediately sends the time string to our client, and drops the connection. The code to connect to our machine's daytime service looks like this:

    use strict;
    use IO::Socket::INET;

    my $client = IO::Socket::INET->new('localhost:daytime')
      or die "new: $@";
    {
      local $/;
      $_ = <$client>;
    }
    if (defined $_) {
      print
        "response from ",
        $client->peerhost,
        " port ",
        $client->peerport,
        " was:\n$_";
    }

If you are experimenting with this code, and can't make a connection to localhost, then try the www.time.gov machine's port 13 instead. Don't abuse this address though: lots of people share this resource. The output of connecting to www.time.gov looks something like this:

    response from 132.163.4.203 port 13 was:

    52765 03-05-06 19:48:17 50 0 0 666.7 UTC(NIST) *

The format of the response from a given daytime port may vary from host to host. The value here is described at <http://www.boulder.nist.gov/timefreq/service/its.htm>.

Notice the use of the peeraddr and peerport method calls against the connection handle. The connection object's method calls are described in perldoc IO::Socket::INET. Here, we're asking for the IP address and port number of the other end of the connection. We can get our own address similarly by calling $client->sockhost and $client->sockport.

Yes, that was a pretty simple connection. Most of the hard work of setting up the socket, specifying the protocol and port information, and making the connection, was all done within the IO::Socket::INET object methods. Also, for daytime, we didn't have to say anything, just listen.

Let's look at a slightly more difficult application: using HTTP to talk to a web server. At its simplest, we fetch a web page by connecting to the web server's port 80, sending it the right string, and then wait for a response. To get the top-level page of a server, we have to send:

    GET / HTTP/1.0

followed by two return/linefeed pairs. Let's get the top-level page of the www.time.gov web server:

    use strict;
    use IO::Socket::INET qw(CRLF);

    my $client = IO::Socket::INET->new('www.time.gov:http')
      or die "new: $@";
    print $client "GET / HTTP/1.0", CRLF, CRLF;
    print while <$client>;

This program starts the same way, although notice the extra import for the CRLF constant, which is just a return/linefeed pair. (This is not necessarily the same as "\r\n" on some platforms, hence the constant.) Once the socket connection is open, we pushed the text of the HTTP request down the socket, and then started listening for the response. As the lines were returned, they're printed to STDOUT. When the remote server closes the connection, that appears as an I/O error, causing the filehandle read to return undef, and drops us out of the loop (and out of the program).

In a real application, I wouldn't write code like this, because we're leaving a lot out, like the host header on the request, parsing of the headers in the response, and dealing with web proxy servers if needed. But on top of code like this, others have come along to implement the full protocol. In particular, the HTTP protocol is well managed in the LWP library, so we have to merely write:

    use LWP::Simple;
    getprint("http://www.time.gov";);

And that's a lot easier to remember. Hopefully though, you can see that there's not really a lot of mysterious stuff going on. We just have to identify the host and port, connect to it, send data, and recieve data.

Although I've said a lot this month, there's a lot I had to leave out for space. For example, with code very similar to this, you can set up a server that can listen for other client connections. You can also talk over UDP (datagram) protocol, rather than TCP (connection) protocol. In fact, Perl's networking ability gets very close to the flexibility provided by systems implementation languages, such as C. But that's a story for another time. Until then, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 47 (Jul 2003)