Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Linux Magazine Column 40 (Sep 2002)

[suggested title: Handling partially-filled query forms with placeholders]

CGI applications often are used for searching through some database. For example, a catalog might let you look for a particular item with a certain cost-range or color. A phone book might let you search for people based on their names or addresses. An on-line dating service might let you pick people based on their gender, location, age, and interests.

When the CGI application is in Perl, the database query is frequently performed using the DBI interface. This amazing product of man-years of effort (coordinated by Tim Bunce) allows a Perl program to interact nearly identically to over a few dozen types of databases, including both commercial and open-source databases, and even ``non-database'' databases like a comma-separated-values (CSV) file. The interaction is primarily in the form of a series of industry-standard SQL statements.

But I frequently see the transition from a CGI form element to a database SQL statement in a way that scares me from a security perspective. For example, let's say that the field firstname contained an SQL ``like'' pattern for the first name of a person I'm searching for in my department. The code might go something like this:

  my $department = 123; # determined by some login process
  my $firstname = param('firstname'); # from the field
  my $sql = "SELECT id FROM employees WHERE department = $department" .
    " AND firstname LIKE \"$firstname\"";

And at first glance, this would appear to be fine, restricting the output to people in my department. For example, if I enter FR% for the SQL pattern, I get the SQL:

  SELECT id FROM employees WHERE department = 123
   AND firstname LIKE "FR%"

However, suppose I know, or can guess, the syntax of this generated statement, and I enter

  %" OR "X" = "X

as the first name. Yes, that's odd-looking in the query box, but notice how that expands:

  SELECT id FROM employees WHERE department = 123
   AND firstname LIKE "%" OR "X" = "X"

Yes. What have I done? I've got the ``ANDed'' condition certainly, but I've introduced an ``OR'' condition that is true for every employee, and I can now dump out far more than the application originally intended!

Of course, when this happens, someone usually screams ``how do I make sure that the field doesn't contain anything that'll let the bad guys do this?'' This is the wrong approach. Instead, the DBI interface includes this nice tool called ``placeholders''. The query should have been written like:

  my $department = 123; # determined by some login process
  my $firstname = param('firstname'); # from the field
  my $sql = "SELECT id FROM employees WHERE department = ?" .
    " AND firstname LIKE ?";
  my $sth = $dbh->prepare($sql);
  $sth->execute($department, $firstname);

For every question-mark in the SQL, the DBI interface will take the corresponding value from the execute method, and place it into the SQL in such a way that the value is properly quoted and protected. And that solves the security problem!

Of course, a search form also might have form fields that are left blank, meaning ``any value is good enough here''. What's an easy way to generate the corresponding SQL and get the execute list correct? Well, you just need a little method to your madness, as illustrated in [listing one, below], a simple application showing a phonebook lookup form.

Lines 1 through 3 start nearly every CGI program I write. The -T switch turns on ``taint mode'', so that input values are not accidentally used to alter the external behavior of the program. The -w switch enables warnings. use strict turns on the three common compiler restrictions for larger programs: disabling symbolic references, demanding that variables be properly introduced, and disabling bare words as subroutine invocations. And finally, standard output is unbuffered: not strictly necessary for this application, but handy if we ever fork a child that might need to include things into the output stream.

Lines 6 through 9 establish a list of abbreviations for the states. I'll use this in a pop-up box definition.

Lines 11 to 25 define the query fields of my form, including the SQL snippets that should be used if they are present in the response. Each element of @QUERY_FIELDS is an arrayref. The first element is the form field name. The second element is an SQL snippet, using the question-mark placeholder. And a third element, if present, is a coderef for the routine to validate the data, and alter it if necessary. We'll see precisely how these fields are used later.

Line 28 pulls in the standard CGI module, along with importing all the form access and HTML generation shortcuts.

Line 30 prints the HTTP/CGI header, the beginnings of the HTML header, and a first-order head of Query. As this is not a complete application, I didn't spend much time adorning the output.

Line 32 prints the tag for a horizontal rule, and the beginning of the form. The form action parameter defaults to the same URL, bringing a form submission right back to this same program, so this CGI script both generates the original form, and processes the results. This is a typical strategy for CGI.pm-based programs.

Lines 34 to 42 print the form itself, with the layout aided by a simple table. A background color of a light-yellow defines the area of the form, designated in line 34. The first row picks up the first and last name, with their appropriate labels. The second row similarly holds the street address and the city.

The third row contains a popup field named state. The list of states comes from the array defined earlier. An additional item of n/a is added to the front of the list, and also selected as the initial default. To keep this type of searching form easy to use, all fields should default to their ``insignificant'' state. The third row also contains a zip-code field. Of course, a zip code would also uniquely define a city and state, but that's beyond the scope of this example, so we'll pretend these are all independent fields. The third row also contains the submit button.

For all three rows, the individual table cells are enclosed in a td HTML shortcut, defined by CGI.pm. And each of these cells are further nested into a table row shortcut, defined as Tr. This shortcut has an unusual name because as an entirely lowercase word, it would collide with the built-in tr operator. Finally, the three rows are enclosed inside a table shortcut, with an additional hashref parameter which becomes the attributes of the table tag for the output.

And finally, line 44 closes out the form with a horizontal rule.

When the form is initially displayed in the browser, I have a chance to fill out one or more of the fields of my choice, and press submit. The same program is reinvoked, but with access to the form parameters using the param function. The code beginning in line 46 handles this response.

Lines 47 and 48 hold the pieces of SQL and their corresponding values. Initially, they'll start out empty. If they're still empty after we've processed all the known fields, then we didn't get a valid query, and there's nothing further to do.

Lines 50 to 61 process each of the known form fields. $_ holds the arrayref of each field in turn. Line 51 pulls out the value for that particular parameter into the scalar $val. (Note that this won't work very well for a multi-value parameter, but that's merely minor change to handle that instead.) If the parameter is not present, we skip on to the next item.

Lines 52 to 58 clean up the data from that form element. If there's a value in the third arrayref element, it's a coderef, which we call, passing the form value as its first argument. We expect the called subroutine to return either the cleaned-up data, or undef, again indicating that this field is ``not present''. If there's no special handling subroutine, a simple editing pass is performed instead: line 55 deletes any leading whitespace, while line 56 deletes any trailing whitespace.

Looking back at lines 18 to 23, we can see that the cleanup routine for the popup menu takes that value and returns undef unless it contains a non-whitespace character but is not our default value of n/a.

Now, if we've made it to line 59, we have a good value in $val, and the corresponding SQL as the second element of the arrayref. So, we'll push each one onto the end of the their corresponding arrays to be gathered and processed later.

Once the individual form elements have been examined, it's time to determine if we have a valid query. Line 62 examines the size of the @sqls array. If this is non-empty, then we've got to proceed to building an SQL statement, in line 63.

Each snippet in @sqls is a clause expected to be AND-ed with the other clauses. So we'll take the snippets and join them with an ``and'' keyword in line 63. In front of that, we'll select the id field from the resulting record for further processing.

In a real program, we'd then execute code similar to that presented in lines 69 to 71. The SQL statement would be prepared from the joined snippets, and then the placeholders would be bound to their corresponding values, and finally we'd be getting the matched records by their ids.

However, since this is just a demo harness, we'll dump out the SQL in line 65, and the list of values for the binding in line 66.

So, we've seen how to walk through a form, picking out the fields and constructing the corresponding SQL for them, and we've also seen the importance of using placeholders. And with no more worries about broken or malicious input values messing up your day, until next time, enjoy!

Listing

        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     ## constants
        =6=     my @STATES = qw(
        =7=       AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS
        =8=       MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY
        =9=     );
        =10=    ## configuration
        =11=    my @QUERY_FIELDS =
        =12=      (
        =13=       ['firstname', 'person.firstname is ?'],
        =14=       ['lastname', 'person.lastname is ?'],
        =15=       ['street', 'person.street is ?'],
        =16=       ['city', 'person.city is ?'],
        =17=       ['state', 'person.state is ?',
        =18=        sub {
        =19=          my $v = shift;
        =20=          return undef unless $v =~ /\S/;
        =21=          return undef if $v eq "n/a";
        =22=          $v;
        =23=        }],
        =24=       ['zip', 'person.zip is ?'],
        =25=      );
        =26=    ## end configuration
        =27=    
        =28=    use CGI qw(:all);
        =29=    
        =30=    print header, start_html, h1('Query');
        =31=    
        =32=    print hr, start_form;
        =33=    
        =34=    print table({-bgcolor => '#ffff88'},
        =35=                Tr(td("First:"), td(textfield('firstname')),
        =36=                   td("Last:"), td(textfield('lastname'))),
        =37=                Tr(td("Street:"), td(textfield('street')),
        =38=                   td("City:"), td(textfield('city'))),
        =39=                Tr(td("State:"),
        =40=                   td(popup_menu('state', ['n/a', @STATES], 'n/a')),
        =41=                   td("Zip:"), td(textfield('zip', '', 10)),
        =42=                   td(submit)));
        =43=    
        =44=    print end_form, hr;
        =45=    
        =46=    {
        =47=      my @sqls;
        =48=      my @vals;
        =49=    
        =50=      for (@QUERY_FIELDS) {
        =51=        next unless defined(my $val = param($_->[0]));
        =52=        if ($_->[2]) {
        =53=          next unless defined($val = $_->[2]->($val));
        =54=        } else {
        =55=          $val =~ s/^\s+//;
        =56=          $val =~ s/\s+$//;
        =57=          next unless length $val;
        =58=        }
        =59=        push @sqls, $_->[1];
        =60=        push @vals, $val;
        =61=      }
        =62=      if (@sqls) {
        =63=        my $sql = "select id from person where ".join(" and ", @sqls);
        =64=        print h2("Resulting query");
        =65=        print pre(escapeHTML($sql));
        =66=        print pre(escapeHTML(join ", ", map { qq{"$_"} } @vals));
        =67=    
        =68=        ## typical use:
        =69=        ## my $sth = $dbh->prepare($sql);
        =70=        ## $sth->execute(@vals);
        =71=        ## while (my ($id) = $sth->fetchrow_array) { ... }
        =72=      }
        =73=    }
        =74=    
        =75=    print end_html;

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.