Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Download this listing!

Linux Magazine Column 23 (Apr 2001)

[suggested title: 'Authenticated remote updates']

Suppose my friend Fred has a nice little web site, and it's grown beyond the amount of work that Fred can do on his own. So he gets his buddy Barney to create some of the HTML and draw up a few of the images. But how can Barney edit the files on Fred's hard drive, especially if Barney is on the wrong side of some corporate firewall?

Well, one thing Fred could try is creating a CGI script to upload the files into the right place. But then the script runs as the web user, and not as Fred, requiring Fred to mess with wide-open permissions (or setuid wrappers) as well as either https authentications or (worse), repeatedly sending the update password over the wire during Basic Authentication handshaking.

OK, so Fred takes a different route. He's running procmail, so he can set up an action based on a specific mail header to cause the content of the message to get dropped into the right place. But what if the content is an image? And what if someone else finds out about that header, and fakes a message from Barney?

Well, sticking with the mail route, all we really need is a way to verify that Barney is really the sender and that an arbitrary binary file can get through. And it'd be nice if it was also encrypted, so that anyone in the middle can't see the secret stuff.

It's a nice thing that the RSA public-key encryption patent recently expired, because I can now more easily recommend (without fear of criminal prosecution) tools like the Gnu Privacy Guard (<http://www.gnupg.org>) as a critical component to solve this problem. And nicely enough, there's a Perl wrapper in the CPAN for GPG. If you're not familiar with GPG, I suggest visiting the website before reading further.

To make this work, Fred and Barney invent GPG keys, and exchange their public keys. They don't have to use their standard keys if they don't want (and in fact, they shouldn't, for reasons I describe later). For testing in fact, I used fred@localhost and barney@localhost, since these two keys will not be used in any public server. To avoid using the standard keys, they both used GPG's --homedir option.

Fred installs a mail handler (described later), and Barney goes to work creating content. He edits his files underneath his ``web image'' directory, then invokes the program given in [listing one, below], listing the files that he's changed. It's ok to list more than those that have been changed, unless Barney is concerned about the number of bytes transmitted.

The configuration in lines 7 to 12 define what happens next. Line 7 defines the top level directory of his ``web image'' directory. This permits Barney the freedom of invoking the encryptor from any directory using relative or absolute names.

Next comes the GPG ``home directory'' in line 8. If Barney is using his standard GPG key database, then this can be the directory of .gnupg in his home directory. Otherwise, Barney can construct a special GPG home directory specifically for this program. Since the passphrase is included within the program, it's probably good to have a distinct private key and passphrase. This directory will have Barney's private keyring, and the public keyring containing both Barney's and Fred's public keys. Fred will have a similar but complementary directory on his machine.

Speaking of passphrase, that's in line 9. That's needed because Barney's private key is being used to sign the message, to verify that it was Barney that published the message, and the private key is locked up with a passphrase.

Line 10 defines the GPG user that signs the message: in this case, Barney. Similarly, line 11 defines the GPG user for which the message is encrypted: Fred. By keeping the GPG databases small for this program, these names can be kept simple.

Finally, line 12 gives the email address to which the encrypted packet is finally mailed.

Lines 16 and 17 set up the modules using autouse so that they're pulled in when the corresponding subroutine is called.

Lines 19 and 20 remind me that we're setting up a three-process pipeline. The first stage generates the Storable object, and writes it to STDOUT. The second stage reads that object from its STDIN, and encrypts it using GPG, writing that to STDOUT. Finally, the third stage reads its STDIN and emails it to the required address. Normally, I'd have done this in one process, but even with the wrapper around GPG, it still really wants to read from STDIN and write to STDOUT, so it was simpler to fork a few times to fix that.

Lines 23 to 38 define the first stage. Line 23 forks, opening up the parent's STDIN as the child's STDOUT, returning true in the parent process. (The child's STDIN is unchanged, and can be used to read from the original standard input.)

Lines 26 to 34 construct the ``payload''. We'll make up a hash whose keys are the filenames relative to the $SRC_ROOT. After, Barney really doesn't need to tell Fred where he keeps his files. Line 29 and 30 handle that conversion.

Lines 32 and 33 grab the content of the file, as well as noting its modification time, and create a hashref value corresponding to that payload key.

Finally, line 36 takes the payload, encodes it with Storable, and sends that to STDOUT. End of task, so that process exits in line 37.

The second stage detaches itself similarly in line 41. At this point, its STDIN is the first stage's standard output. Nice. We'll pull in the GnuPG wrapper in line 43, and trigger the encryption using all the needed parameters specified in lines 44 through 48. The armor parameter here ensures that the output is nice ASCII text suitable for emailing (even if the original source material was a binary, like an image).

This stage is then a pipeline, reading the payload from standard input, writing the encrypted text message to standard output, and exits in line 49.

OK, I admit it. When I was first sketching out the program, I had envisioned the use of Net::SMTP or one of the other dozen mailing modules. Then I just said, ``whatever'', and decided the programs were getting long enough, so I punted and just invoked /bin/mail. Leverage. Remember, ``leverage''.

So, Barney invokes this program, and an encrypted chunk of data is now scurrying along toward Fred's domain. Let's say Fred routes all of his domain mail through procmail. To extract the mail, he adds a few lines to his .procmailrc to route all mail for web-update into the standard input of the decryptor module, as follows:

    :0
    * To: web-update@webhaus.comm
    {
            :0 W
            | /home/fred/lib/web-update

            :0 e
            /home/fred/lib/web-update.badmail
    }

Here, we're telling procmail to take all appropriately addressed email and feed it into the program presented in [listing two, below]. If for some reason the web-update program fails, procmail will kindly drop the original mail into the ``bad mail'' drop, for later analysis or reprocessing.

Lines 7 to 20 control the actions of this web updating program. Remembering that this program acts with Fred's privileges based on incoming email, we want to be very conservative about what can happen. In particular, the only files that can possibly be changed will need to be below $DEST_ROOT in line 7, so at least the damage will be limited to there.

Line 8 defines a logfile for all actions, probably something to be watched over time (perhaps summarized by another Perl program). Line 9 defines Fred's web-publishing GPG ``home directory''. Again, this could be Fred's normal GPG directory, but then the passphrase would have to be potentially compromised by living within the program in line 10.

Lines 11 to 20 define the roles and authorizations for those roles. A given GPG signature (obtained by looking at the output of GPG's --list-signatures option) represents an individual. The mapping in %ROLES gives that person a series of ``roles'' that they play. Here, I'm letting Barney be the ``HTML updater'' and ``GIF updater''. Other users would have overlapping or distinct roles.

Then, those roles are mapped into permissions using %AUTHS. For each role, a series of anonymous subroutines will be executed against a five item list:

        The basename of the file being uploaded
        The pathname relative to $DEST_ROOT
        The absolute pathname
        The absolute directory of the destination file
        An info hash (described later)

So each user maps into one or more roles, and each role maps into one or more authorizations for that role. For each file uploaded by a given user, if any of the authorization subroutines returns true, then that file is permitted to be uploaded. Yes, I probably could have made this more complicated had I taken the time, and I'm not sure this is sufficient for the most general case, but it's a good start.

Lines 29 to 36 map all the warn and die messages so that they have a nice process ID number and timestamp.

Lines 44 and 45 remind me that there's only two processes in this pipeline, except that it's a little more complicated. The first stage will read the original standard input (the text of the incoming email message), delivering the original Storable-encoded object to the standard input of the second stage.

But the second stage must act on that object only if the decryption is signed, and signed by the right key. However, that's known only to the first stage! So, we create a separate pipe before forking that the first and second stage can use to communicate. This is done in line 48. It's important that the first stage close the read side, and the second stage close the write side, or we'll never get an EOF at the right time.

Liens 51 to 70 set up the first stage. The decryption takes place in lines 57 and 58. If the original message is successfully decrypted and signed, $h is a hashref with parameters about the signature. If so, we send it down the secondary channel as a Storable-encoded object. If not, we send an empty hash down that channel.

The processor beginning in line 72 has all the tough logic, because this is where we really get the work done, and where we have to be very conservative and distrust anything that is even slightly fishy.

Lines 75 and 76 grab the hashref that came from the decryption process. This is not the payload, but the authentication information about the payload, and will be the same value as $h in the other process if all went well, and then we fetch the payload in line 82.

Line 84 is a reminder that one of the parameters unused (so far) is the ``sigid'', which is unique for every signed document. We can record this sigid and reject any other message with the same sigid as having been ``already processed''. This prevents a ``replay attack'', where an intermediate bad guy resends intercepts the email, and sends it over and over again, hoping to retrigger the same action. As you'll see in a moment, for this particular use, a replay attack would be useless except to reinstate deleted files if those weren't being checked in the authorization.

Line 85 logs the GPG user for which this message was signed. Note that this has to be a GPG user that Fred has in his GPG public key listing.

Lines 87 to 94 extract out all the posssible coderef subroutines from which this particular GPG user recieves authorization. First, for the given fingerprint, we fetch all the roles. If there are no roles, there's no point in going further. Then we fetch all the subroutines for all those roles, and stash it away in @auths for use with each file.

Line 96 gets an ``attic prefix'', used later when a file is being updated to save the previous version.

Lines 98 to 140 are executed for each file in the payload. First, the full path is computed in line 100, and the relative path is recomputed in line 101 and verified in line 102 to prevent any funny business in the pathname: remember, trust no one!

Lines 104 to 110 determine if the authenticated user is authorized for this file by running all the authorization subroutines. Note the fifth subroutine parameter is a hashref containing the information about the entry, including the modification time and the contents.

If we're authorized, it's time to get cracking. Line 111 makes the directory containing the entry, including any parent directories if needed. Line 113 gives default permissions for the file, unless the file already exists. Hmm, maybe in another revision of the program, we can give Barney some control over this value.

Lines 114 to 128 deal with the previous version of the existing file. If the file exists, and it's not newer (lines 117 to 120), then we need to stash the current version into the ``attic''. Line 122 defines this attic as a subdirectory within the same directory as the file, named .attic. The current file is linked into this attic with a name that depends on the time of day and process ID number.

Next, lines 129 to 138 create a new file near the destination file, and give it the right permissions and modification time. If all goes well, the file is renamed into place in line 137. This is all performed with the possibility of concurrent file accesses in mind (such as the live data of a webserver). Line 139 logs the successful update.

And there you have it, a mechanism to deliver encrypted, authenticated Perl objects by email, and a system built on that to provide controlled remote publishing, all in under 200 lines of Perl, hacked out in a few hours on a lazy afternoon (in between playing a few rounds of Ridge Racer V on my new Playtation 2). Yeah, it's not CVS, but it's a good base for doing some other very cool things. Until next time, enjoy!

Listings

        =0=     #### listing one (encryptor)
        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     ## begin config
        =6=     
        =7=     my $SRC_ROOT = '/home/barney/webfiles';
        =8=     my $GPG_HOME = '/home/barney/.publish-gpg';
        =9=     my $GPG_PASSPHRASE = 'barneyphrase';
        =10=    my $LOCALUSER = 'barney';
        =11=    my $REMOTEUSER = 'fred';
        =12=    my $EMAILTO = 'web-update@webhaus.comm';
        =13=    
        =14=    ## end config
        =15=    
        =16=    use autouse 'Storable' => qw(store_fd retrieve_fd);
        =17=    use autouse 'File::Spec::Functions' => qw(abs2rel rel2abs);
        =18=    
        =19=    ## set up pipeline
        =20=    ## generate object | encrypt | mailer
        =21=    
        =22=    ## first stage: generate object
        =23=    unless (my $pid = open STDIN, "-|") {
        =24=      die "Can't fork: $!" unless defined $pid;
        =25=    
        =26=      my %payload;
        =27=      for (@ARGV) {
        =28=        local *F;
        =29=        my $abs = rel2abs($_);
        =30=        my $rel = abs2rel($abs, $SRC_ROOT);
        =31=        die "$abs is not below $SRC_ROOT" if $rel =~ m/\A\.\.(\/|\z)/s;
        =32=        open F, "<$abs" or die "cannot open $abs: $!";
        =33=        $payload{$rel} = {content => join("", <F>), mtime => (stat F)[9]};
        =34=      }
        =35=    
        =36=      store_fd \%payload, \*STDOUT;
        =37=      exit 0;
        =38=    }
        =39=    
        =40=    ## second stage: encrypt
        =41=    unless (my $pid = open STDIN, "-|") {
        =42=      die "Can't fork: $!" unless defined $pid;
        =43=      require GnuPG;
        =44=      GnuPG->new(homedir => $GPG_HOME)->
        =45=        encrypt(passphrase => $GPG_PASSPHRASE,
        =46=                recipient => $REMOTEUSER,
        =47=                'local-user' => $LOCALUSER,
        =48=                armor => 1, sign => 1);
        =49=      exit 0;
        =50=    }
        =51=    
        =52=    ## third stage: send mail
        =53=    exec "/bin/mail", $EMAILTO;     # punt :)
        =54=    die "cannot exec /bin/mail: $!";
        =0=     #### listing two (decryptor)
        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     ## begin config
        =6=     
        =7=     my $DEST_ROOT = "/home/httpd/htdocs/";
        =8=     my $LOGFILE = "/home/fred/lib/web-update.log";
        =9=     my $GPG_HOME = "/home/fred/.publish-gpg"
        =10=    my $GPG_PASSPHRASE = 'fredphrase';
        =11=    my %ROLES = (
        =12=                 ## barney:
        =13=                 "80989563762BC0677D96542EFAA3AAF8282564B7" => ['html', 'gif'],
        =14=                );
        =15=    my %AUTHS = (
        =16=      'images' => [sub { $_[1] =~ /^images\/.*\.(gif|jpe?g)$/ }],
        =17=      'editor' => [sub { $_[0] !~ /^\./ }],
        =18=      'html' => [sub { -f $_[2] and $_[0] !~ /^\./ and $_[0] =~ /\.html$/ }],
        =19=      'gif' => [sub { -f $_[2] and $_[0] !~ /^\./ and $_[0] =~ /\.gif$/ }],
        =20=    );
        =21=    
        =22=    ## end config
        =23=    
        =24=    use autouse 'Storable' => qw(store_fd retrieve_fd);
        =25=    use autouse 'File::Spec::Functions' => qw(abs2rel rel2abs catfile);
        =26=    use autouse 'File::Basename' => qw(fileparse);
        =27=    use autouse 'File::Path' => qw(mkpath);
        =28=    
        =29=    sub __stamp {
        =30=      my $message = shift;
        =31=      my(@now) = localtime;
        =32=      my $stamp = sprintf "[%d] [%02d@%02d:%02d:%02d] ",
        =33=        $$, @now[3,2,1,0];
        =34=      $message =~ s/^/$stamp/gm;
        =35=      $message;
        =36=    }
        =37=    
        =38=    $SIG{__WARN__} = sub { warn __stamp(shift) };
        =39=    $SIG{__DIE__} = sub { die __stamp(shift) };
        =40=    
        =41=    open STDOUT, ">>$LOGFILE" or die "Cannot append to $LOGFILE: $!";
        =42=    open STDERR, ">&STDOUT";
        =43=    
        =44=    ## set up pipeline
        =45=    ## <message decrypt | processor (with secondary channel)
        =46=    
        =47=    ## establish secondary channel...
        =48=    pipe FROM, TO or die "cannot pipe: $!";
        =49=    
        =50=    ## first stage: decrypt
        =51=    unless (my $pid = open STDIN, "-|") {
        =52=      die "Can't fork: $!" unless defined $pid;
        =53=    
        =54=      close FROM;
        =55=    
        =56=      require GnuPG;
        =57=      my $h = GnuPG->new(homedir => $GPG_HOME)->
        =58=        decrypt(passphrase => $GPG_PASSPHRASE);
        =59=    
        =60=      if ($h and ref $h) {
        =61=        store_fd $h, \*TO;
        =62=      } else {
        =63=        store_fd {}, \*TO;
        =64=        warn $h ? "not signed\n" : "cannot decrypt\n";
        =65=      }
        =66=    
        =67=      close TO;
        =68=    
        =69=      exit 0;
        =70=    }
        =71=    
        =72=    ## second stage: processor
        =73=    close TO;
        =74=    
        =75=    die "BAD PARENT RESPONSE, aborting" if eof(FROM);
        =76=    my $h = retrieve_fd \*FROM;
        =77=    close FROM;
        =78=    
        =79=    die "failed validation" unless keys %$h;
        =80=    
        =81=    ## we've got validation, so fetch the payload
        =82=    my $payload = retrieve_fd \*STDIN;
        =83=    
        =84=    ## TODO: record $h->{sigid} and reject duplicate as a replay attack
        =85=    warn "processing an update from $h->{user}...\n";
        =86=    
        =87=    my @auths = do {
        =88=      my $roles = $ROLES{$h->{fingerprint}}
        =89=        or die "No roles for $h->{fingerprint}";
        =90=      map {
        =91=        my $auths = $AUTHS{$_};
        =92=        $auths ? @$auths : ();
        =93=      } @$roles; # list of coderefs
        =94=    };
        =95=    
        =96=    my $prefix = time . ".$$.";
        =97=    
        =98=    while (my($rel, $info) = each(%$payload)) {
        =99=      local *F;
        =100=     my $abs = rel2abs($rel, $DEST_ROOT);
        =101=     $rel = abs2rel($abs, $DEST_ROOT); # should be same as original $rel
        =102=     die "$abs is not below $DEST_ROOT" if $rel =~ m/\A\.\.(\/|\z)/s;
        =103=     my ($basename, $dirname) = fileparse($abs); # dirname ends in slash
        =104=     do {
        =105=       my $ok = 0;
        =106=       for (@auths) {
        =107=         last if $ok = $_->($basename, $rel, $abs, $dirname, $info);
        =108=       }
        =109=       $ok;
        =110=     } or warn("$rel: not authorized, skipping\n"), next;
        =111=     mkpath([$dirname], 0, 0755);
        =112=     -d $dirname or die "Missing $dirname";
        =113=     my $perms = 0644;             # default unless previous
        =114=     if (-e $abs) {
        =115=       my $mtime = (stat _)[9];
        =116=       $perms = (stat _)[2] & 0777; # previous perms
        =117=       if ((my $age = $mtime - $info->{mtime}) >= 0) {
        =118=         warn "$rel: skipping older file ($age seconds)\n";
        =119=         next;
        =120=       }
        =121=   
        =122=       my $attic = catfile($dirname, ".attic");
        =123=       mkpath([$attic], 0, 0755);
        =124=       -d $attic or die "Missing $attic";
        =125=       my $atticfile = catfile($attic, "$prefix$basename");
        =126=       link $abs, $atticfile or die "Cannot ln $abs $atticfile: $!";
        =127=       warn "$rel: previous saved in .attic/$prefix$basename\n";
        =128=     }
        =129=     {
        =130=       my $tmp = "$basename.$$";
        =131=       open F, ">$tmp" or die "Cannot create $tmp: $!";
        =132=       print F $info->{content};
        =133=       close F;
        =134=       chmod $perms, $tmp or warn "cannot chmod($perms,$tmp): $!";
        =135=       utime $info->{mtime}, $info->{mtime}, $tmp or
        =136=         warn "cannot set mtime on $tmp: $!\n";
        =137=       rename $tmp, $abs or die "Cannot mv $tmp $abs: $!";
        =138=     }
        =139=     warn "$rel: new version installed\n";
        =140=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 23 (Apr 2001)

Listings