Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 46 (Feb 2000)

[suggested title: Uploading files and sending MIME mail]

Most of us have a junk drawer in our house. The one with the random bits of discarded stuff that we hope might be useful in the future, like the last part of a roll of duct tape, a couple of nuts and washers that for some reason weren't needed when we reassembled the shelves this time, and so on. When something needs to get done, we rumble through the drawer looking for something of use, usually to pass the same useless items from time to time, but every once in a while going ``Yeah, I'm so glad I saved that!'' when finding a match.

Well, as a software toolsmith, I have a virtual junk drawer as well. I collect little snippets of code that I see float by, in hopes of reassembling them into some useful tool someday. Recently, I needed to do some file uploads as well as learn how to send MIME mail, and three snippets I'd happened to save all came together in such a nice way that I thought I'd share the program with you. Of course, you can take the program as is, but I hope you instead throw this program into your ``virtual junk drawer'' so that if anyone asks you to upload a file, send a MIME attachment, or strip a macbinary resource fork, you'll find this gem in your drawer.

So, this month we'll do some simple task in a nice way. Upload a file, send it as email somewhere. And if the file happens to be uploaded as a macbinary encapsulated file, we'll even extract the data fork if requested. And that brings us to the program in [listing one, below].

Line 1 tells my Unix-compatible kernel where to find Perl and to turn on compile-time and run-time warnings. Line 2 enables the common restrictions: variables must be declared with my, soft references are not permitted, and barewords (Perl Poetry Mode) are disabled. Line 3 disables buffering on STDOUT, not particularly used here, but handy during development.

Lines 5 through 12 define the things that you'll most likely want to change to use this program. As in all my columns, this listing is meant to be a model, not something ready-to-run. You're supposed to steal the ideas, not the code. I've also altered these addresses slightly from what would be used in real life so that this program is harmless as-is. I've found that too many ``script kiddies'' just download these programs from my website and then run them without looking at what they contain.

Line 7 defines the From: address on the mail sent from this script. The name is arbitrary, but should be a valid address. If the destination address is unavailable, most likely the mail transfer agent (MTA) will bounce it back to this From: address. I usually use my email address here, or if I'm doing something relating to the website, my email role address of webmaster at the web box.

Line 8 similarly defines a destination address for mail sent by this script. If you have procmail or qmail some other mail handling tool, and your mail server allows variant addressing, you can select a unique delivery address for all email coming from this script, and set up a specific handler for just these uploads.

Similarly, line 9 defines the subject line of the email text. Even if you can't use a variant delivery address, you might still be able to trigger some action on a unique subject line to your normal email address. So select the subject line wisely.

Line 10 is a boolean flag (non-zero for true, zero for false) to select whether to include the meta-data on the upload. If enabled, a separate text attachment is included in the mail with all the upload parameters (as reported by CGI.pm) and all the environment variables provided to the invocation. This is handy to track down exactly what this blob is that is being mailed to you, but it's not in any machine-readable format for this demonstration program. Most likely, you might be interested in the reported filename, and the identity of the uploader (host and ``referer'').

Line 14 pulls in the CGI.pm, enabling all the shortcuts without using the scary, messy, and funky ``object-oriented'' mode.

Lines 16 triggers the fetching of the parameters, if any. This is needed to grab the uploaded file early in the program, and abort if the input parameters are wrong. Most likely that will be from an ill-formatted file upload, like perhaps a truncation. The upload data is being sent into a uniquely named file in /tmp (by default), so we aren't keeping the uploads contents in memory, just the name.

If there was an error gathering the upload data, cgi_error will report it, and lines 17 to 20 will generate an error response, rather than continuing with the rest of the program.

And now we get to the real code. Lines 22 to 37 print the upload form, regardless of whether or not we got an uploaded file on this invocation. That way, we can use the script to get the initial form, as well as continuing to encourage further uploads in succession.

Line 23 prints the HTTP header, identifying the content as HTML. Line 24 prints the HTML header, titling this response as simply Upload. Line 25 gives a first-level header of Upload as well. See, this proves this is just a demonstration: in real life, you'd be much more creative with the titles and headers. Or at least, I'd hope.

Lines 26 to 37 define the upload form. As is my convention, I enclose the form in horizontal rules. Line 27 is what distinguishes this form as an upload form rather than a normal form. Instead of declaring the encoding type to be application/x-www-form-urlencoded, we get multipart/form-data, which will generate a different response on the upload, allowing entire files to be uploaded efficiently as content instead of just formfields.

Lines 28 to 35 define the form contents, inside a table so we can control layout. (Don't tell any of my buddies that suggest that a table should be used solely for content description and not layout: I'll be excommunicated from their group.)

Line 29 is the upload file field. We give a name to the parameter (here: uploaded_file) but this parameter is not where the contents will be returned. They'll be retrieved specially using extra functions provided by CGI.pm. This field will show up as a filename box, probably associated with a Browse... button. The user will most likely press the button, bounce around for a file from their system, and then select or accept that file as the designated uploaded file.

Lines 30 and 31 define the type to use for email. If text is selected, the file is mailed as a quoted_printable representation: something that can be mostly discerned by the naked eye without resorting to massive computer power, although still binary-safe for odd characters. If binary is selected, the upload content is encoded in base64, much more efficient if the text contains many non-normal bytes (especially bytes in the 128 to 255 region). The resulting attachment will be identical either way once decoded: this is merely a selection of how readable it is in transit.

Line 33 selects whether to strip the Macintosh Resource Fork on uploads from Internet Explorer for the Mac. Mac IE insists on wrapping a plain data file in macbinary on uploads, even if there's nothing of interest in the resource fork. Of course, this confuses the heck out of the rest of the world and destroys cross-platform utility. So, if this box is selected, and the file came from IE (thankfully marked by an appropriate content type), then the datafork (what most people would call the contents of the upload) is extracted from inside the macbinary container and sent instead. Of course, Mac IE seems to provide no compatibility button to allow me to make it work like anything else on the Mac, so I can't turn this mode off. Grr.

Now comes the real business. If we were invoked with parameters, we're ready to receive the uploaded file and send it along its merry way. Line 40 pulls in the MIME::Lite module (found in the CPAN). I do it with a require here because I want it brought in only if we are doing an actual sending of mail, saving us time when all we're doing is generating the initial upload.

If we got a file upload parameter, the upload function given the parameter name (defined in the form above) returns the upload object into $file in line 41. This upload object is simultaneously the filename in a string context, or the filehandle when used as a filehandle. We can pass this string to uploadInfo to get the browser-provided information in line 42. We can use this to figure out what MIME type was reported, for example.

Lines 43 through 45 begin to define the outgoing email, using the MIME::Lite constructor of new. We'll create a message of type multipart/mixed as a container, and select an appropriate sender and receiver address, and subject line.

If the message should contain the meta-data, lines 46 to 56 include this as the first attachment (so it'll be at the top of the file for easy processing). The encoding of 7bit ensures that this text is mostly readable. However, it's also a promise that no 8-bit data will be included, so MIME::Lite will strip such data, so use with caution.

Lines 49 trhough 54 define the contents of this attachment as computed data, here an arrayref of text elements. Each element of the hash referenced by $info is included, as well as each provided environment variable. This should give the recipient enough information to know how and why this file was uploaded.

Lines 57 to 65 handle the core of this script's purpose: attaching the uploaded file. First, we'll select the encoding in lines 58 through 60. Then, we'll either include a reference to the file with FH, or a computed data string if the file is a macbinary file that needs to be stripped. If the strip_resource_fork box is checked on the upload, and the browser reported the file to be macbinary, we pass the filehandle to the strip_fork_from_fh subroutine defined below. This subroutine takes care of ripping out the datafork from the middle of the macbinary encapsulation.

Line 66 sends the email by connecting to the SMTP port on the local host. Some web servers don't run email, so you might need to change this to a cooperative distant host. But I didn't make this a configuration parameter because it's rare enough that I didn't want to worry about it. If the send is successful, we'll get a true value, and report upload sent by email to the browser.

If an error occurred during sending, I return the entire mail back to the user in a pre element. This is clearly not something to do in a production program, because it would require re-downloading the entire (possibly large) file just to say something broke. However, it was great while I was developing the program: I merely added a 0 && in front of the sending code, and this branch of the if was always taken, so that I could see the message that would have been sent without continually reinvoking my mail reading tool.

Line 75 wraps up the invocation. There's no executable code after this point: just the definition of the subroutine, so we're all done.

Lines 77 through 85 define the subroutine that extracts the data fork (what most people would call the real file contents) from the middle of the macbinary encapsulation (which also includes the resource fork and some metadata like type and creator). I got this code off the net, so if it's not entirely accurate, I'm sure one of you will tell me. Basically, the first 128 bytes appears to be a header, and bytes 84 through 87 appear to be a network-order integer value of how many bytes immediately following the header constitute the datafork. So, with the right manipulation, I'm done.

And there you have it, a simple demonstration of three things: how to upload afile, how to send mail with an attachment, and how to strip that annoying resource fork out of file uploads. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     ## configuration
        =6=     
        =7=     my $FROM = 'webmaster@www.stonehenge.comXX';
        =8=     my $TO = 'merlyn+upload@stonehenge.comXX';
        =9=     my $SUBJECT = 'File upload';
        =10=    my $INCLUDE_META = 1;
        =11=    
        =12=    ## end configuration
        =13=    
        =14=    use CGI qw(:all);
        =15=    
        =16=    my @params = param();
        =17=    if (my $error = cgi_error()) {
        =18=      print header(-status => $error);
        =19=      exit 0;
        =20=    }
        =21=    
        =22=    print
        =23=      header,
        =24=      start_html("Upload"),
        =25=      h1("Upload"),
        =26=      hr,
        =27=      start_multipart_form,
        =28=      table(Tr(td(p('upload:')),
        =29=               td(filefield('uploaded_file'))),
        =30=            Tr(td(p('email as type:')),
        =31=               td(radio_group('type', [qw(binary text)]))),
        =32=            Tr(td({ -colspan => 2 },
        =33=                  checkbox(-name => 'strip_resource_fork',
        =34=                           -label => 'Strip Macbinary Resource Fork'))),
        =35=            Tr(td({ -colspan => 2 }, submit))),
        =36=      end_multipart_form,
        =37=      hr;
        =38=    
        =39=    if (@params) {
        =40=      require MIME::Lite;
        =41=      if (my $file = upload('uploaded_file')) {
        =42=        my $info = uploadInfo($file) or die "info?";
        =43=        my $msg = MIME::Lite->new
        =44=          (Type => 'multipart/mixed',
        =45=           From => $FROM, To => $TO, Subject => $SUBJECT);
        =46=        if ($INCLUDE_META) {
        =47=          $msg->attach
        =48=            (Type => 'TEXT', Encoding => '7bit',
        =49=             Data => [
        =50=                      "Upload info:\n",
        =51=                      (map { "$_ => $info->{$_}\n" } sort keys %$info),
        =52=                      "ENV:\n",
        =53=                      (map { "$_ => $ENV{$_}\n" } sort keys %ENV),
        =54=                     ],
        =55=            );
        =56=        }
        =57=        $msg->attach
        =58=          ((param('type') eq 'text' ?
        =59=            (Type => 'TEXT', Encoding => 'quoted-printable') :
        =60=            (Type => 'BINARY', Encoding => 'base64')),
        =61=           ((param('strip_resource_fork') &&
        =62=             $info->{"Content-Type"} eq "application/x-macbinary") ?
        =63=            (Data => strip_fork_from_fh($file)) :
        =64=            (FH => $file)),
        =65=          );
        =66=        if ($msg->send_by_smtp('localhost')) {
        =67=          print p("Upload sent by email.");
        =68=        } else {
        =69=          print
        =70=            p("An error occurred... here's what would have been sent:"),
        =71=            pre($msg->as_string);
        =72=        }
        =73=      }
        =74=    }
        =75=    print end_html;
        =76=    
        =77=    sub strip_fork_from_fh {
        =78=      my $fh = shift;
        =79=    
        =80=      my $len = read $fh, (my $buf), 128; # read the header
        =81=      die "short read: $len" unless $len == 128;
        =82=      my $bytes = unpack("x83N", $buf); # get datafork length
        =83=      read $fh, $buf, $bytes;
        =84=      $buf;
        =85=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.