Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 61 (May 2001)

[suggested title: Basic Cookie Management]

Ahh, cookies. One of my pet peeves is the amount of bad cookie code I see out there, including the reaction that a website gives me when I choose not to permit cookies (usually because I'm feeling rebellious).

Cookies are one of many ways to turn the stateless HTTP into a stateful session-based series of transactions. (Some of the others include using some sort of authentication, or mangling the URLs, or including hidden data in forms.) But cookies get my ire because many web programmers presume that ``one user is one browser'', because that's the basic model of the cookie itself as well. That's demonstratably completely untrue. I myself have three different browsers open at the moment, and I have been known to go into an ``internet cafe'' from time to time to use the browsers supplied there. While I personally move from one browser to another, my cookies don't follow me!

The wrong way to use cookies, therefore, is to have a login form, and on successful login, send out a cookie that lasts until year 2003 to that browser. That's bad. I can't login on another browser, and if I forget to logout of a browser at an ``internet cafe'', the next user who stumbles across the same website is (gasp!) already logged in as me!

Another wrong way to use cookies is to send out a bunch of data in a cookie, like the entire contents of the shopping cart. I say wrong because most people who do this seem to trust the data as it's being returned on the next hit, and nothing stops me from changing the price of that $300 item I just bought to $1 instead, if it's all coming from the cookie.

Still another wrong way to use cookies is to send dozens of cookies, like one for each graphic. Goodness knows, I've been to some sites and had to accept a baker's dozen of cookies before I even see the entire page.

And yet another wrong way is to let the cookie's expiration time serve as the security policy for timing out an active user. A browser does not have to respect any expiration times. Do not count on that.

And even worse, sometimes I've seen servers go into infinite loops checking for cookies to be set, and redirecting if the cookie is not set, never telling the user why things are awry.

Can you tell I've seen a lot of bad cookie code? Do you now understand why the hairs generally stand on the back of my neck when someone mentions ``I need cookies for this application''? Well, then read on.

There is a reasonably safe way to use cookies. Use cookies only to brand a particular browser, and only for the duration of a browser session. The cookie should be a single small cookie with a short but unguessable value (such as the MD5 hash of some cryptographically strong material). Then, this particular ``branded'' browser will be sending back this cookie only while it is currently open.

Next, take the brand-mark, and use it to key into a database to lookup a particular user for that branded browser. The database should have a timestamp of recent activity, and be distrusted after the timeout period.

Finally, use the verified user value to key into another database for session information, like a shopping cart or personal preferences. Don't use the browser-brand value for anything other than a one-step mapping to a user, because otherwise the user cannot migrate her session over to a new browser without restarting some of the transaction, and that's annoying. (In fact, you should probably permit the same user to log in on multiple browsers simultaneously.)

Sounds hard? Naah. It's just a few dozen lines of Perl code. How do I know? I hacked it out just recently. And I present this sample reference implementation of this strategy in [Listing one, below]. Please keep in mind that this is not a complete application: just the part that handles the ``what user is logged in to this browser?'' question.

Lines 1 through 3 start nearly every program I write, turning on taint mode (good for CGI programs), warnings (good for catching stupid mistakes), compiler restrictions (good for catching more stupid mistakes) and disabling buffering on STDOUT (good for CGI programs).

Line 5 pulls in the veritable CGI.pm module, including all the function shortcuts.

Lines 7 to 29 handle the ``branding'' of a particular browser with a unique cookie. Keep in mind that this has to be done before we've sent anything to standard output, because we may need to issue a new Set-Cookie header, or perhaps a redirect to ourselves as a cookie test.

Line 8 fetches the browser cookie, if any. If present, $browser is now a unique string (actually, an MD5 signature of some unique data). However, if it's absent, we've got some work to do to make this browser our own.

Lines 9 and 10 recognize the common case, after this program has been invoked once: namely, that we've got a good browser ID. The _cookiecheck parameter is described later, but we must make sure it's out of the mix for later code.

If we had no cookie in line 9, then we have two possibilities: either the cookie had never been sent, or the browser refused to send it back. In either case, we first prepare a potential new cookie using lines 12 though 15. The MD5 module (found in the CPAN) allows us to create a 32-character hex string from a given arbitrary data item. In this case, we're using the time of day, a random number, the process ID, and the stringified hashref of a newly created throwaway hash, simply as icky glue.

This is not as secure as using cryptographically strong items: there are modules in the CPAN to make it harder to guess. However, this code was lifted directly from Apache::Session, a well-known chunk of code to handle session management, so I feel confident knowing I can at least blame someone else.

Line 17 distinguishes whether this is a first invocation rather than an invocation where we've had at least one chance to set a cookie (and was therefore refused). If _cookiecheck is defined, we've had at least one try to get it right, so we dump out an HTML page (lines 19 to 23) stating our demands. We also try setting a cookie one more time; maybe the user will get tired of saying ``reject this cookie'', or maybe they just didn't like that particular hex string (who knows?).

The form submission in line 22 will cause us to come back to the same page, but with _cookiecheck possibly still set. (If not, then we'll get two hits to get back to here again, just as when we started.)

If this is the first visit, then _cookiecheck will not be set, so we set it in line 25, and do an external redirect to ourselves to verify the cookie is indeed present.

By the time we hit line 30, we've now branded the browser with a unique cookie identification, and that's in $browser.

The next step is to determine if this browser is ``logged in'' or not. We'll keep track of that with a lightweight database, made possible with the File::Cache module from the CPAN. (Late-breaking news: the author of this module has started to generalize the caching structure into a separate Cache::Cache module, so by the time you read this, things might work differently, so beware.)

Line 34 ``opens'' the cache by creating a cache object in $cache. We'll set the cache items to expire within an hour, meaning that no user can be logged in for longer than one hour of inactivity. You might permit this to be longer or shorter (longer for low-risk items, shorter for high-risk items), but one hour is a good starting point.

Lines 41 to 44 handle a small housekeeping chore for the cache. If a user doesn't come back but hasn't logged out, her cached user ID still exists as a file in the database directory (until the next time it is fetched). But it most likely won't be fetched, since that cookie will also expire when the browser is closed, so we've got a dead file sitting around. Every 4 hours, the _purge_ entry will expire, so we'll let the first lucky user who happens to invoke this program right after that go through the cleanup process. This should be very lightweight; if you're concerned about doing this at CGI time, you could instead pull this out to a separate cron job (but be sure the job runs as the web user, not as you).

Line 46 pulls out the user associated with this browser, if any. If there's an entry in the cache, but it's older than an hour, the entry is deleted, and we get back undef, the same as if the entry doesn't exist. So if there is a defined value here, it's current, and the user is logged in as $user. Otherwise, there's no user associated with the browser uniquely identified with $browser.

Lines 50 to 66 handle the transitions between logged in and logged out. If the user is logged in and has requested a logout, lines 52 to 55 handle that. The parameter requesting logout is deleted (for sticky forms), and the user is removed from the cache database. $user is also undefined to reflect this for the rest of the program.

Lines 57 to 65 handle logging in. First, the requested username and password are read. Next the username is checked for well-formedness (which I've arbitrarily defined here as ``looks like a Perl identifier''), and then we verify the correct password for this user by calling verify. I've defined a simple version for this down at the bottom of the program in lines 98 to 101 that simply returns true if the username is a substring of the password. Please don't use this in real life: this is just a demo. If the password's good, $user gets set; otherwise, we reject the attempt.

Lines 68 to 83 handle the actions useful within the current state.

For logged in users, we'll do a couple of things. Each time a logged-in user returns to the page, we update the cache time in line 70, to permit her to stay logged in for another hour from now. Line 72 displays a simple ``log out'' form button, which reinvokes this same program including a _logout parameter. Recall that this parameter was being tested up in line 51.

For logged out users, lines 74 to 82 display that status and present a simple login form with a submit button, using a table for layout. Please don't fault my lack of HTML design skills: I'm illustrating structure here, not my graphics aptitude which I admit is sorely lacking.

The code from line 85 downward would be where your real application goes, using the code above as a framework. The rest of the application could count on $user to be the name of an authenticated user logged into the browser of choice, and active within the past hour. As a sample do-nothing application, I thought I'd leave in my testing code that I used while developing this program to see what the current cookies and parameters contained.

Lines 87 to 94 execute a loop twice: once with $title set to Cookies and a $f set to the coderef for the cookie function (provided by CGI.pm), and a second time with Params and the param function instead. I had originally written this as two separate displays, but then writhed a bit at the similarity of the code, which I then factored out and parameterized. Thank goodness for coderefs.

Line 90 prints the second-level header for the title, then follows it with a table containing the cookie or parameter keys in the first column, followed by their value in the second column. Because both cookies and parameters can be multivalued, I've added code to join multiple values by commas (line 92). Also, since both the keys and values can contain HTML-significant markup (less-thans, greater-thans, ampersands, and so on), I pass the data through escapeHTML (provided by CGI.pm) before display.

Note that line 93 invokes the function (either cookie or param) with no arguments to get a list of all things of that type, while the end of line 92 invokes that same function passing it one item of that type to get its value. It's very nice that they have that same interface.

Lines 98 to 101 were described earlier, but this is also a part of the program you'd definitely want to rewrite for a real application.

So, in summary, cookies can be reasonable for session management, as long as the logged in state is clear, a logout button is clearly visible, the cookie expires when the browser is closed, and the session expires after an inactivity timeout value (typically an hour) is reached. Have fun handing out cookies, and don't forget the milk. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use CGI qw(:all);
        =6=     
        =7=     ## cookie check
        =8=     my $browser = cookie("browser");
        =9=     if (defined $browser) {         # got a good browser
        =10=      Delete("_cookiecheck");       # don't let this leak further
        =11=    } else {                        # no cookie? set one
        =12=      require MD5;
        =13=      my $cookie = cookie
        =14=        (-name => 'browser',
        =15=         -value => MD5->hexhash(MD5->hexhash(time.{}.rand().$$)));
        =16=    
        =17=      if (defined param("_cookiecheck")) { # already tried!
        =18=        print +(header(-cookie => $cookie),
        =19=                start_html("Missing cookies"),
        =20=                h1("Missing cookies"),
        =21=                p("This site requires a cookie to be set. Please permit this."),
        =22=                startform, submit("OK"), endform,
        =23=                end_html);
        =24=      } else {
        =25=        param("_cookiecheck", 1);   # prevent infinite loop
        =26=        print redirect (-cookie => $cookie, -uri => self_url());
        =27=      }
        =28=      exit 0;
        =29=    }
        =30=    
        =31=    ## At this point, $browser is now the unique ID of the browser
        =32=    
        =33=    require File::Cache;
        =34=    my $cache = File::Cache->new({namespace => 'cookiemaker',
        =35=                                  username => 'nobody',
        =36=                                  filemode => 0666,
        =37=                                  expires_in => 3600, # one hour
        =38=                                 });
        =39=    
        =40=    ## first, some housekeeping
        =41=    unless ($cache->get(" _purge_ ")) {
        =42=      $cache->purge;                # remove expired objects
        =43=      $cache->set(" _purge_ ", 1, 3600 * 4); # purge every four hours
        =44=    }
        =45=    
        =46=    my $user = $cache->get($browser); ## either the logged-in user, or undef
        =47=    
        =48=    print header,start_html('session demonstration'),h1('session demonstration');
        =49=    
        =50=    ## handle requested transitions (login or logout)
        =51=    if (defined $user and defined param("_logout")) {
        =52=      Delete("_logout");
        =53=      $cache->remove($browser);
        =54=      print p("You are no longer logged in as $user.");
        =55=      undef $user;
        =56=    } elsif (not defined $user and defined (my $try_user = param("_user"))) {
        =57=      Delete("_user");
        =58=      my $try_password = param("_password");
        =59=      Delete("_password");
        =60=      if ($try_user =~ /\A\w+\z/ and verify($try_user, $try_password)) {
        =61=        $user = $try_user;
        =62=        print p("Welcome back, $user.");
        =63=      } else {
        =64=        print p("I'm sorry, that's not right.");
        =65=      }
        =66=    }
        =67=    
        =68=    ## handle current state (possibly after transition)
        =69=    if (defined $user) {
        =70=      $cache->set($browser,$user);  # update cache on each hit
        =71=      print p("You are logged in as $user.");
        =72=      print startform, hidden("_logout", 1), submit("Log out"), endform;
        =73=    } else {
        =74=      print p("You are not logged in.");
        =75=      print
        =76=        startform,
        =77=          table({-border => 1, -cellspacing => 0, -cellpadding => 2},
        =78=                Tr(th("username:"),
        =79=                   td(textfield("_user")),
        =80=                   td({-rowspan => 2}, submit("login"))),
        =81=                Tr(th("password:"), td(password_field("_password")))),
        =82=                  endform;
        =83=    }
        =84=    
        =85=    ## rest of page would go here, paying attention to $user
        =86=    
        =87=    for ([Cookies => \&cookie], [Params => \&param]) {
        =88=      my ($title, $f) = @$_;
        =89=    
        =90=      print h2($title), table 
        =91=        ({-border => 0, -cellspacing => 0, -cellpadding => 2},
        =92=         map (Tr(th(escapeHTML($_)), td(escapeHTML(join ", ", $f->($_)))),
        =93=            $f->()));
        =94=    }
        =95=    
        =96=    ## sample verification
        =97=    
        =98=    sub verify {
        =99=      my($user, $password) = @_;
        =100=     return index($password, $user) > -1; # require password to contain user
        =101=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.