Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 54 (Oct 2000)

[suggested title: There can be only one ... more way to do it!]

I find it nice that with my familiarity of Perl, I can solve those little ``emergency'' tasks without having to flip through a bunch of manuals to spend time learning. For example, I had a problem the other day that was causing horrible response time with my web server for www.stonehenge.com, and yet within a few minutes and a couple dozen lines of Perl code, I was able to get things back in order.

My webserver is on a nicely-configured Linux box co-located at an ISP with 24 by 7 reboot service (although the box is rarely rebooted, as you'll see why). The box is actually shared with a dozen other e-commerce sites, and this is by design, because then when the box is down, it's not just me yelling at the admin, but a dozen others that are calling as well. Thus, we all have to play nice, because we're sharing the CPU and sharing the resources.

Well, one of the customers of this ISP is the regional sales office of a Very Large Company that has one of the largest market capitalizations in the world right now. (Why they don't run these applications on their corporate web site, I'm not sure, but I never asked. The usual answer is ``politics and local control'', so there's no point.) They apparently have some sort of free email newsletter that has subscribers counted in the mid-5-digits or so. A recent email newsletter (that went out over the weekend) basically said in effect ``we are terminating all mail subscriptions before the next issue unless you visit URL such-and-so and enter your renewal information''. In other words, if the subscriber didn't respond, they'd be dropped from the list.

Well, you can imagine the panic that this would generate on a Monday morning as thousands of people returned to work to discover that they might be removed from the mailing list. The given URL mapped to a CGI script, which was being invoked dozens of times simultaneously, so there were dozens of web-server processes (actually, both web-server and Perl process pairs). To make things worse, the first invocation of the Perl program gathered information about the subscriber, and then made a trip through DBI to a MySQL database, to present a confirmation form and opportunity to correct the subscription information. This form was then processed by a second invocation of the same CGI script, again reconnecting with MySQL, to update the information and finish the process.

I immediately began chatting with Doug, the author of the script and the manager of the box to try to determine why the load average on a box that is typically under 0.5 had now gone to something like 15 or 20, making my site nearly useless. After determining as many of the facts that our IRC session would let us share, I quickly suggested that Doug move the script into an Apache::Registry area of his mod_perl-enabled server. At least this would prevent multiple compilations and forks, and probably could reuse the DBI handle as well. He was pretty adamant about not doing that, because he had written the code long before he knew about mod_perl, and thus had likely done things that were not very clean from Apache::Registry's perspective. Additionally, he felt investing the time to make it Apache::Registry-compliant would be wasted, since this script would eventually be moved to the client's machine which did not support mod_perl.

So, still watching the extremely high load average, I then suggested to him that he invoke ``the Highlander solution''. In the movie ``Highlander'', the catch-phrase is ``There can be only one!'', referring to the continual showdown amongst these immortal beings that would eventually kill all of each other off, leaving just one victor who would inherit ``the Prize''. Similarly, whenever exclusive access is needed to a resource, the word ``highlander'' is bandied about to mean an implementation solution or structure to control that resource.

Here, I was asking him to ensure through some locking mechanism that only one CGI invocation was being processed at a time. I had in mind a simple flock at the beginning of the script, opening up a sentinel file (no, not another television reference) and then requesting an exclusive file lock on that handle. The first script in would create the file, grab the exclusive lock, and then proceed on its merry way, releasing the lock at the end of the program when the handle was automatically (or explictly) closed.

If a second script should be started while the first is active, it would open the same file, and then attempt to lock the filehandle. At this point, the operating system would block the second process, leaving it sitting around in a suspended state until the first process had completed. Third and subsequent invocations would likewise be blocked, but the operating system releases only one process at a time for the exclusive lock.

So, as I'm trying to describe this over IRC, it becomes clear that Doug is not up to the task, so I spend a few minutes whipping up the solution. It looked like 5 lines of Perl, until I thought about what to do when the system was very busy, such as right while I was trying to get this fixed.

Let's say there were 15 script invocations. The 15th invocation would be sitting in a queue behind 1 active and 13 pending other processes. If the delay were substantial, the web server aborts the CGI processing, causing some 5xx-series error indicating a server malfunction with no clue about why this is happening. I didn't consider that very friendly, so I kept moving forward with the next idea.

I changed strategy to perform a ``non-blocking'' exclusive file lock, in a retrying loop. A normal lock is ``blocking'', in that the operating system does not return from the operation until the type of lock requested is available. However, sometimes blocking isn't wanted, such as when having an alternate resource is satisfactory. Or in this case, when we want to simply see if we can get an exclusive lock, and if not, try to get it later.

So the loop I constructed tries to get an exclusive lock 10 times, sleeping one second between invocations. Each try is a ``this moment only'' deal. If the lock is available, we nab it and move on, knowing that we're now king of the hill. If not, we wait a while, or give up. Again, recall I was typing this in a hurry, trying to get something working. And this hastily written code is presented in [listing one, below].

Recall that this is just a snippet added to a larger script, so the normal #! line won't appear. Line 1 brings in the CGI.pm module, without any of the HTML-generating shortcuts. I left my normal :all parameter off the import list because I didn't want my change to collide with any of Doug's existing code. In hindsight, I could have switched to a temporary package like so:

  {
    package My::Highlander;
    use CGI qw/:all/;
    # rest of this snippet goes here
  }

And that way I could have avoided the use of the CGI::... construct later. Yeah, there's more than one way to do it alright.

Line 2 brings in two constants needed for the flock operator later. The Fcntl module (which I have not yet found an easy pronunciation for) defines many constants relating to file operations, and this is certainly appropriate here. I've been thoroughly chastized on public discussion areas for my use of literal numbers like 2 and 4 on flock in the past, so I want to make amends by doing it right.

Line 4 opens the sentinel file on the HIGHLANDER handle in append mode. The mode is mostly unimportant, except that we want to make sure the file is created if it doesn't exist. The filename needs to be in an area that is writable by the webserver userid, and /tmp is a safe bet. The CGI program was named renew.cgi, hence the name of the file relates to the name of the script. Death here will trigger a 500 error, but like I said, I was typing fast and furious to get this to work so I could get back to work.

Lines 8 through 21 form a loop, to be executed 10 times. Repetitions are controlled by the variable $count defined and initialized to 0 in line 7. Because $count is defined in a block started in line 6 (and ending in line 22), it cannot conflict with any other use of $count earlier or later in the program.

Line 9 attempts to obtain an exclusive lock on the file opened on the HIGHLANDER filehandle. The or'ing of the two values LOCK_EX and LOCK_NB (to get the number 6, but I'm cheating to know that) requests an exclusive lock, but in non-blocking mode. If the flock is successful, we get a true return value, and the last operator takes us out of the block started in line 8. If the flock fails, we drop through to line 10, which pauses the process for one second.

Line 11 increments count, and ensures that it is still below 10. If so, the redo operator pops back up to line 8, retrying the flock. If not, we've tried 10 times to flock, or ur, actually, 9 times to flock (durn fencepost off-by-one errors!), and it's time to report the error.

Line 13 grabs the REMOTE_HOST environment variable, which will help us determine who indeed we are not serving this time. Since they have reverse DNS turned on under this server, we should be getting a nice domain name here of the host attempting to access this CGI (or at least the intermediate proxy).

However, under some circumstances, the reverse DNS fails or is not available. I couldn't remember if REMOTE_HOST contains a numeric dotted quad at that point (like 10.1.2.3) or whether it was undefined. So to be defensive in my programming (remember, I'm under the gun here), I simply used REMOTE_ADDR in line 14 if REMOTE_HOST was undefined. Probably in five more minutes of poking around, I could have determined that line 14 is probably unnecessary. But hey, it worked, and again, that was the important part.

Line 15 dumps an error message to the web server error log, presenting the program name (in $0), the current time of day (from localtime), and an indication that we failed due to a ``highlander abort''. I wanted the string to be distinct enough that we could easily detect how successful this highlander code was in deterring overloadings.

Lines 16 through 19 dump back the response for an abort. We print a CGI header with a status of 503, appropriately earmarked as ``service unavailable''. According to the specification, we can additionaly send a ``retry after'' header along with this status response, which compliant clients will be able to determine a later time (measured in seconds) after which the service is likely to be restored. Honestly, I don't know what the browsers on the market do with 503 errors, but I'm at least following the standard.

Note that line 18 sends out a text/plain MIME type. Again, being lazy, I didn't want to write a full HTML page, so I took the quickest way out, letting me just type a line of text in line 19 without adding a lot of angley-brackety thingies.

Line 20 aborts the program, but with a nice exit status. Since we've ``handled'' the error, we don't want the web server to also go through its error trigger steps by exiting with a non-zero exit status.

And there it is. Whipped out in about 15 minutes, and installed immediately by Doug. But did it help?

It sure did. The load average shot down from the mid-20's to just around 2 or so, very tolerable. We both watched the error log, with tail -f to see how many people were getting turned away in relation to the customers being served, and found that 70% of them were getting through just fine, and because they weren't all trying to compete in parallel, they were actually getting done with minimal fuss. Perl saved the day!

So, the next time you have an expensive script burning up too much CPU, maybe you too need to utter in your best Sean Connery accent: ``There can be only one!'' Until next time, enjoy!

Listings

        =1=     use CGI;
        =2=     use Fcntl qw(LOCK_EX LOCK_NB);
        =3=     
        =4=     open HIGHLANDER, ">>/tmp/renew.cgi.highlander" or die "Cannot open highlander: $!";
        =5=     
        =6=     {
        =7=       my $count = 0;
        =8=       {
        =9=         flock HIGHLANDER, LOCK_EX | LOCK_NB and last;
        =10=        sleep 1;
        =11=        redo if ++$count < 10;
        =12=        ## couldn't get it after 10 seconds...
        =13=        my $host = $ENV{REMOTE_HOST};
        =14=        $host = $ENV{REMOTE_ADDR} unless defined $host;
        =15=        warn "$0 @ ".(localtime).": highlander abort for $host after 10 seconds\n";
        =16=        print CGI::header(-status => 503,
        =17=                          -retry_after => 30,
        =18=                          -type => 'text/plain'),
        =19=                            "Our server is overloaded.  Please try again in a few minutes.\n";
        =20=        exit 0;
        =21=      }
        =22=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.