Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 07 (Dec 1999)

[suggested title: Watching your web server]

My web server for http://perltraining.stonehenge.com is on a nicely configured shared Linux box at a 24x7-manned co-location facility. While I'm not really in a system administrator role for this box, I still want to be sure that my web things aren't bogging the system down unnecessarily (or the other ecommerce users will start rallying to kick me off). This is especially true as I experiment more with dynamically generated pages and toys for columns like this one.

So, the other day, I found myself invoking the Linux standard top program, taking stabs at configuring it to watch my webserver. But since I'm not the only httpd-something on the box, I kept seeing other webservers there, and it messed up my view. Also, I couldn't tell if the child CGI scripts were expensive or cheap, since they would show up as some other process in the display.

I thought to myself that it'd be nice to have a Perl program that does just what I want top to really do: get the information about the processes that make up my Apache server, and show how CPU-bound and page-fault-bound they are, including any child CGI processes that got launched. To do this, I'd need:

  1. A way to draw things on the screen repeatedly with minimal refresh. No problem: Perl has the Curses.pm module to do this!

  2. Some access to the web server so that I can get the process IDs associated with this web server. Again, Apache has a mod_status module that can tell me this information.

  3. A way of getting top or ps information about CPU time and page faults. Upon a little investigation, I found the /proc file system which provided everything I needed, and no extraordinary privileges required for my program!

So, I decided to use Perl as (once again) the ``duct tape of the Internet'', gluing together three separate things: my xterm window, the web server information, and the system resource values. And that resulted in my little program that I hastily threw together in [Listing one, below]. Now, mind you, this is merely a proof-of-concept, thrown together in a couple of hours to solve a specific task. But hopefully, you can see how easy it is to use Perl to glue things together to make entirely new system administration tools rather easily.

The program goes as follows: lines 1 through 3 provide the standard header for most of my programs, where I enable warnings, turn on compiler restrictions, and unbuffer standard output.

Line 5 pulls in the LWP::Simple module, part of the LWP library found in the CPAN (located at www.cpan.org, amongst other places). We'll be using the get routine from this module to fetch a given URL for its contents.

Line 6 pulls in the Curses module, also found in the CPAN. This module provides us with Perl access to the Curses library, letting me draw screens with minimal refresh on updates. As the Curses module is a little more persnickity about installation, be sure you issue the make test before the make install to make sure that everything is OK.

Lines 8 and 9 provide the only configuration constants for this program. Line 8 needs to map to the mod_status trigger URL for my web server (which I've deliberately mangled here, since I don't want you-all to be pinging my server). This URL needs to be specifically enabled, as discussed in the URL http://www.apache.org/docs/mod/mod_status.html. This also means that your server has to have mod_status compiled in. Line 9 is a simple count of seconds between refresh and update of the information.

Lines 11 to 17 give names to the /proc/nnn/stat whitespace-separated fields. Most of the names and definitions came from the proc(5) manpage, but I had to get a few by scruffing through the kernel source at /usr/src/linux/fs/proc/array.c, which seems to be more up-to-date than the manpage (fancy that). For the descriptions of these fields, see the corresponding manpage or source.

Lines 19 to 22 define how to show these fields. $SHOWLABLEF gives a printf-acceptable field width definition. $SHOWLABLE generates the column headers on the first line of the screen, and @SHOWFIELDS defines the fields from which the data comes. Note that most of these names are names that match @FIELDS earlier, but a few are uppercase. Those fields are computed fields, some from the webserver status (like STATUS), and some from other fields (like CPU and PCPU).

Lines 25 to 30 fetch the boot time of this machine. We need this value because the start time of a process is defined in terms of ``jiffies'' (0.01 seconds each) after the boot time of the machine. To do this, we'll scruff through the contents of the virtual file called /proc/stat looking for the btime nnnnn entry.

Line 32 declares the %cpu_history variable, keeping track of the prior total CPU usage for each process so that we can determine how much new CPU has been used, and therefore what percentage of total CPU a process has used.

Lines 34 to 90 form the main display loop of the program. The initscr call in line 34 comes from the Curses library, and sets up all the screen-related parameters, as well as erasing the screen. Lines 35 to 89 create an infinite loop: note the redo in line 88.

Lines 36 to 39 fetch the web server status. I'm passing in a ?notable query parameter attached to the server status URL. This causes mod_status to spit the data out in a slightly more parseable format, without a lot of table tags. Line 38 deletes the output up to the individual server details, while line 39 toss everything after the hr tag. Note that I needed an s and m and i suffix on that substitution, and I realized that I could throw in a harmless o option, so that I could spell the word osmosis. Silly.

Lines 41 and 42 declare two hashes that get cleared out on each iteration. The %info hash holds the information about each process, keyed by PID number. The %cpu hash is also keyed by PID number, and holds the total CPU usage, as well as the timestamp at which the usage was taken (for percentage calculations).

Lines 44 and 45 figure out how long it's been since the previous round of this loop. Of course, it'll be close to $SLEEPTIME, but rather than count on that, we can figure the numbers out accurately. Line 44 saves the current Unix time into $cpu{TIME}, while line 46 computes the difference between this value and the historical value saved on the previous pass. If this number is 0, no percentages are displayed (as noted later).

Line 47 takes the contents returned from the web server and extracts all the interesting PIDs and their respective web-server status, as a hash in %http_status. The ugly regular expression is needed because the server returns:

        Ready

for a ready process, but a bolded:

        <b>DNS Lookup</b>

for anything that's not just ``ready''. The bold tags would get in the way, so I sniff around inside of them if they are present.

Lines 49 to 69 fetch information about each process from the /proc filesystem. The PIDs of interest are the keys of %http_status, and we transform each of them into the right filename in line 49.

The process 123 has status information at pseudo-file /proc/123/stat. Lines 50 to 53 fetch the contents of that file into $_. Note that if we can't get to a file, we simply skip it, presuming that the process may have gone away between the time we looked at the webserver and the time we are looking at the /proc entry. No big deal to have lost this.

Line 55 and 56 establish a %fields variable to hold all of the whitespace delimited values for a given process. The @rest variable is an artifact of debugging; if there's anything in there, it means my @FIELDS list is wrong, as it originally was when I was looking only at the manpage and not the kernel source.

Lines 57 and 58 extract the process ID number (PID), and store the data into the %info hash keyed by that PID. Line 59 creates the first non-/proc field for this PID as the web-server status.

Lines 60 through 68 compute the CPU usage (both absolute and percentage) for this process. First, lines 60 and 61 add together the user and system jiffies for both the process and its waited-for children into $cpu. Next, this value is stored into both the %cpu hash and the %info hash keyed by the PID, in line 62. Finally, lines 63 through 68 compute a percentage (with 1 digit after the decimal point) if possible, and set up the PCPU pseudo-field to hold that information.

Once we've processed all the PIDs on this round, line 70 copies the CPU information into %cpu_history to serve as a baseline for the next iteration. If you had any other variables that needed a relative increment rather than an absolute value, you could save them likewise here.

Finally, lines 72 to 86 generate the screen, now that we have all the data. Line 72 is a Curses call to ``erase'' the screen. This doesn't really send out any characters, but it marks the in-memory version of the screen as all blank. There won't be any output until line 86, where Curses will compare the resulting in-memory version with the screen view of what was sent last time, and update just those portions that have changed.

Line 73 puts the label at the top of the display, using another Curses routine. As this is a constant string, I could have avoided adding it each time by erasing on the portion of the screen below the first line, but I didn't care to go through all that work. The 0,0 value here is the upper-left corner of the screen, indexed by rows and then columns.

Lines 74 to 84 dump out each of the rows of information, keeping track of each line's row via the $row variable. The sort expression in lines 76 to 78 select PIDs in their order by start time, with the PID itself the tie breaker if two start times were identical.

Lines 79 to 82 dump out the information for a particular process. The first element of @SHOWFIELDS is sent through time_convert (defined later>, while the remaining elements are passed as-is to the sprintf operator.

Line 85 moves the virtual cursor to the upper left corner. Line 86 sends the changed characters out to the real terminal, including moving the real cursor to the upper left corner as well. Line 87 sleeps for 10 seconds, before starting the whole process all over again.

The endwin in line 90 is never reached. However, you could invoke Curses routines to see if a key was pressed during a sleep period, and use that to exit the program or change some configuration parameters, just like top. For me, I was done when I was able to hit CTRL-C at the right time to get me out, which this program does.

Lines 92 to 101 convert a starttime value into a human-readable string. For starters, the start time is measured as jiffies past boot time, so we figure out a Unix timestamp value in $when in line 94. Then, if the timestamp is more than 12 hours ago, we'll use the date, otherwise the hour, minute, and second. Both are derived by looking at the scalar return value of localtime, which generates a nice readable string.

And there you have it. You can see this ``proof of concept'' program in action if you build and install LWP and Curses from the CPAN, and then ensure that you have a URL that can trigger your webserver's mod_status URL in the $STATUS variable. Of course, the fun really begins when you take the techniques illustrated here and apply them to other things to be watched, such as database servers or mailers.

I hope this has been useful for you. Enjoy the holidays, and I'll see you next year, presuming the computing world hasn't gone back to the stone age in a big Y2K meltdown! Enjoy.

Listings

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use LWP::Simple;
        =6=     use Curses;
        =7=     
        =8=     my $STATUS = "http://localhost/server-status";;
        =9=     my $SLEEPTIME = 10;
        =10=    
        =11=    my @FIELDS = qw(
        =12=      pid comm state ppid pgrp session tty tpgid flags
        =13=      minflt cminflt majflt cmajflt utime stime cutime cstime
        =14=      counter priority timeout itrealvalue starttime vsize rss rlim
        =15=      startcode endcode startstack kstkesp kstkeip
        =16=      signal blocked sigignore sigcatch wchan nswap cnswap
        =17=    );
        =18=    
        =19=    my $SHOWLABELF = "%11s %5s %12s %1s %6s %6s %6s %6s %6s %6s";
        =20=    my $SHOWLABEL = sprintf $SHOWLABELF,
        =21=      qw(START_TIME PID STATUS S MINFLT MAJFLT CPU PCPU RSS NSWAP);
        =22=    my @SHOWFIELDS =
        =23=      qw(starttime pid STATUS state minflt majflt CPU PCPU rss nswap);
        =24=    
        =25=    my $BOOTTIME = do {
        =26=      local *FOO;
        =27=      open FOO, "/proc/stat" or die "/proc/stat: $!";
        =28=      local $/;
        =29=      (<FOO> =~ /btime (\d+)/)[0];
        =30=    };
        =31=    
        =32=    my %cpu_history;
        =33=    
        =34=    initscr;
        =35=    {
        =36=      $_ = get "$STATUS?notable" or die "no status!";
        =37=    
        =38=      s/[\d\D]+Server Details.*\n//;
        =39=      s/^<hr>.*//osmosis;           # :-)
        =40=    
        =41=      my %info;
        =42=      my %cpu;
        =43=    
        =44=      $cpu{TIME} = time;
        =45=      my $seconds = exists $cpu_history{TIME} ? $cpu{TIME} - $cpu_history{TIME} : 0;
        =46=    
        =47=      my %http_status = /Server \d+-.*?\((\d+)\).*\[(?:<.*?>)?(.*?)(?:<.*?>)?\]/g;
        =48=    
        =49=      for my $file (map "/proc/$_/stat", keys %http_status) {
        =50=        local *FILE;
        =51=        open FILE, $file or next;
        =52=        $_ = <FILE>;
        =53=        close FILE;
        =54=    
        =55=        my %fields;
        =56=        (@fields{@FIELDS}, my @rest) = split;
        =57=        my $pid = $fields{pid};
        =58=        $info{$pid} = \%fields;
        =59=        $info{$pid}{STATUS} = $http_status{$pid};
        =60=        my $cpu = $fields{utime} + $fields{stime}
        =61=          + $fields{cutime} + $fields{cstime};
        =62=        $cpu{$pid} = $info{$pid}{CPU} = $cpu;
        =63=        if ($seconds and exists $cpu_history{$pid}) {
        =64=          ## delta jiffies over seconds is already percentage!
        =65=          $info{$pid}{PCPU} = sprintf "%5.1f%%", ($cpu-$cpu_history{$pid}) / $seconds;
        =66=        } else {
        =67=          $info{$pid}{PCPU} = "??????";
        =68=        }
        =69=      }
        =70=      %cpu_history = %cpu;
        =71=    
        =72=      erase;
        =73=      addstr(0,0,$SHOWLABEL);
        =74=      my $row = 1;
        =75=      for my $pid (sort {
        =76=        $info{$a}->{starttime} <=> $info{$b}->{starttime}
        =77=          or $info{$a}->{pid} <=> $info{$b}->{pid}
        =78=        } keys %info) {
        =79=        addstr($row,0,
        =80=               sprintf($SHOWLABELF,
        =81=                       time_convert($info{$pid}{starttime}),
        =82=                       @{$info{$pid}}{@SHOWFIELDS[1..$#SHOWFIELDS]}));
        =83=        $row++;
        =84=      }
        =85=      move(0,0);
        =86=      refresh;
        =87=      sleep $SLEEPTIME;
        =88=      redo;
        =89=    }
        =90=    endwin;
        =91=    
        =92=    sub time_convert {
        =93=      my $jiffies = shift;
        =94=      my $when = $BOOTTIME + $jiffies/100;
        =95=      my $string = localtime $when;
        =96=      if ($when < time - 12*60*60) {
        =97=        substr($string, 4, 7) . substr($string, -4, 4);
        =98=      } else {
        =99=        substr($string, 11, 8);
        =100=     }
        =101=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.