Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 47 (Jul 2003)
[suggested title: ``The Simplicity of Sockets'']
The Internet connects people to other people and to services. These connections are made with applications talking to each other on separate machines, over various media such as Ethernet, dialup, and 802.11b wireless. These applications, in turn, are using a simple but powerful tool called a ``socket'' to provide the datastream between them. Let's look at sockets and how to describe and use them.
A socket is a connection from a process to another process. The two processes are typically on different machines, but they can also be on the same machine.
The most common use of a socket is to connect a client with a server, similar to placing a phone call to some phone service. The client creates a socket (picks up the phone), then connects it to the server (dials the phone, waits for an answer).
One the sockets are connected, communication is generally bidirectional, speaking using some agreed-upon protocol. You must have a protocol to know who is talking at any given time, and what it means. For example, the HyperText Transfer Protocol (HTTP) defines how to get a web page, including defining precisely what each side's role is in the conversation. Simple Mail Transfer Protocol (SMTP) defines the transmission of email, and so on. When the connection is finished, both sides hang up.
To Perl, a socket is presented as a filehandle, and is read and written using ordinary filehandle operations.
Just as a phone might be identified by its phone number, a socket is identified using its internet (IP) address and port number.
The address is generally unique to a particular machine (although a
machine might have multiple addresses). In the current IPv4 scheme,
an address has a dotted quad value, like 10.1.2.3. Each number
within the address represents a byte value, and must therefore range
between 0 and 255. Because addresses can be difficult to remember, or
might change from time to time, we generally also assign names as
aliases, such as www.stonehenge.com
. (One of the functions of the
DNS system is to map between names and addresses, but let's not get
distracted.)
Associated with a particular address are a range of ports, numbered from 1 to 65535, although usually only a small number of these ports are ever in use at one time. Ports below 1000 are reserved for ``system processes''. Ports above 1023 are available for ``user processes''. (Yes, there's a gap, because people disagreed about the meaning of ``1K'' in some of the TCP/IP stack implementations.)
Many port numbers have predefined agreed-upon meanings, such as port 25 for email, port 119 for news, port 80 for web, and so on. These pre-defined port numbers are important for well-known services because knowing the port number is essential to establishing the connection. (A server port can be established using a transient port number, but then the port number has to be communicated by other means.)
All connections have both an address and a port number at both ends. (You're always calling from a phone to a phone.) Generally, you let the operating system pick a temporary port number for the client end of the connection.
The easiest way to create a simple client socket is with the
IO::Socket::INET
module, part of the libnet
distribution in the
CPAN. A sample socket creation looks like:
use IO::Socket::INET; my $connection = IO::Socket::INET->new(@parameters) or die "Cannot connect: $@";
The @parameters
will be described in a moment. Note that the error
comes back in $@
, not $!
. (Some have argued that this is a
misdesign, either on Perl's part or on the author's part, but we're
stuck with the inconsistency either way.)
The resulting value in $connection
acts like a filehandle: we can
print to it to send data over the wire:
print $connection "Some value\n";
Or read from it to accept the next data to be received:
my $result = <$connection>;
By default, if data is not yet ready, the read will block the process, just like reading from an I/O device (like a terminal) or a pipe. For a typical connection, the data won't necessarily arrive in the same chunks as it was sent: you could print that first string as 11 separate characters, and yet the reciever will likely get just one full line instead.
Let's look at a simple connection to a known server. Many systems
provide a ``daytime'' service. This service provides the time of day in
a format similar to the output of the date
command, and was
originally used to help synchronize system clocks on various machines
on a network. (These days, nearly all clock synchronization is
performed with NTP, which permits much more detailed timing
information to be accumulated and processed.)
To connect to the daytime service, we need two things: the IP address on which the service resides, and the port of the service. The port number is easy: by agreement, it's port 13 (although we don't need to know that because Perl can figure it out). The IP address is a bit trickier, because many modern Unix systems have turned off this service, as suggested by the security theory of ``don't run anything unecessary''.
When we connect to the daytime service, there's nothing to send. The server just notices the connection, and immediately sends the time string to our client, and drops the connection. The code to connect to our machine's daytime service looks like this:
use strict; use IO::Socket::INET;
my $client = IO::Socket::INET->new('localhost:daytime') or die "new: $@"; { local $/; $_ = <$client>; } if (defined $_) { print "response from ", $client->peerhost, " port ", $client->peerport, " was:\n$_"; }
If you are experimenting with this code, and can't make a connection
to localhost
, then try the www.time.gov
machine's port 13
instead. Don't abuse this address though: lots of people share this
resource. The output of connecting to www.time.gov
looks something
like this:
response from 132.163.4.203 port 13 was:
52765 03-05-06 19:48:17 50 0 0 666.7 UTC(NIST) *
The format of the response from a given daytime port may vary from host to host. The value here is described at <http://www.boulder.nist.gov/timefreq/service/its.htm>.
Notice the use of the peeraddr
and peerport
method calls against
the connection handle. The connection object's method calls are
described in perldoc IO::Socket::INET
. Here, we're asking for the
IP address and port number of the other end of the connection. We can
get our own address similarly by calling $client->sockhost
and $client->sockport
.
Yes, that was a pretty simple connection. Most of the hard work of
setting up the socket, specifying the protocol and port information,
and making the connection, was all done within the IO::Socket::INET
object methods. Also, for daytime
, we didn't have to say anything,
just listen.
Let's look at a slightly more difficult application: using HTTP to talk to a web server. At its simplest, we fetch a web page by connecting to the web server's port 80, sending it the right string, and then wait for a response. To get the top-level page of a server, we have to send:
GET / HTTP/1.0
followed by two return/linefeed pairs. Let's get the top-level page
of the www.time.gov
web server:
use strict; use IO::Socket::INET qw(CRLF);
my $client = IO::Socket::INET->new('www.time.gov:http') or die "new: $@"; print $client "GET / HTTP/1.0", CRLF, CRLF; print while <$client>;
This program starts the same way, although notice the extra import for
the CRLF
constant, which is just a return/linefeed pair. (This is
not necessarily the same as "\r\n"
on some platforms, hence the
constant.) Once the socket connection is open, we pushed the text of
the HTTP request down the socket, and then started listening for the
response. As the lines were returned, they're printed to STDOUT
.
When the remote server closes the connection, that appears as an I/O
error, causing the filehandle read to return undef, and drops us out
of the loop (and out of the program).
In a real application, I wouldn't write code like this, because we're
leaving a lot out, like the host
header on the request, parsing of
the headers in the response, and dealing with web proxy servers if
needed. But on top of code like this, others have come along to implement
the full protocol. In particular, the HTTP protocol is well managed
in the LWP
library, so we have to merely write:
use LWP::Simple; getprint("http://www.time.gov");
And that's a lot easier to remember. Hopefully though, you can see that there's not really a lot of mysterious stuff going on. We just have to identify the host and port, connect to it, send data, and recieve data.
Although I've said a lot this month, there's a lot I had to leave out for space. For example, with code very similar to this, you can set up a server that can listen for other client connections. You can also talk over UDP (datagram) protocol, rather than TCP (connection) protocol. In fact, Perl's networking ability gets very close to the flexibility provided by systems implementation languages, such as C. But that's a story for another time. Until then, enjoy!