Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Linux Magazine Column 40 (Sep 2002)
[suggested title: Handling partially-filled query forms with placeholders]
CGI applications often are used for searching through some database. For example, a catalog might let you look for a particular item with a certain cost-range or color. A phone book might let you search for people based on their names or addresses. An on-line dating service might let you pick people based on their gender, location, age, and interests.
When the CGI application is in Perl, the database query is frequently performed using the DBI interface. This amazing product of man-years of effort (coordinated by Tim Bunce) allows a Perl program to interact nearly identically to over a few dozen types of databases, including both commercial and open-source databases, and even ``non-database'' databases like a comma-separated-values (CSV) file. The interaction is primarily in the form of a series of industry-standard SQL statements.
But I frequently see the transition from a CGI form element to a
database SQL statement in a way that scares me from a security
perspective. For example, let's say that the field firstname
contained an SQL ``like'' pattern for the first name of a person I'm
searching for in my department. The code might go something like
this:
my $department = 123; # determined by some login process my $firstname = param('firstname'); # from the field my $sql = "SELECT id FROM employees WHERE department = $department" . " AND firstname LIKE \"$firstname\"";
And at first glance, this would appear to be fine, restricting the
output to people in my department. For example, if I enter FR%
for the SQL pattern, I get the SQL:
SELECT id FROM employees WHERE department = 123 AND firstname LIKE "FR%"
However, suppose I know, or can guess, the syntax of this generated statement, and I enter
%" OR "X" = "X
as the first name. Yes, that's odd-looking in the query box, but notice how that expands:
SELECT id FROM employees WHERE department = 123 AND firstname LIKE "%" OR "X" = "X"
Yes. What have I done? I've got the ``ANDed'' condition certainly, but I've introduced an ``OR'' condition that is true for every employee, and I can now dump out far more than the application originally intended!
Of course, when this happens, someone usually screams ``how do I make sure that the field doesn't contain anything that'll let the bad guys do this?'' This is the wrong approach. Instead, the DBI interface includes this nice tool called ``placeholders''. The query should have been written like:
my $department = 123; # determined by some login process my $firstname = param('firstname'); # from the field my $sql = "SELECT id FROM employees WHERE department = ?" . " AND firstname LIKE ?"; my $sth = $dbh->prepare($sql); $sth->execute($department, $firstname);
For every question-mark in the SQL, the DBI interface will take the
corresponding value from the execute
method, and place it into the
SQL in such a way that the value is properly quoted and protected.
And that solves the security problem!
Of course, a search form also might have form fields that are left
blank, meaning ``any value is good enough here''. What's an easy way to
generate the corresponding SQL and get the execute
list correct?
Well, you just need a little method to your madness, as illustrated in
[listing one, below], a simple application showing a phonebook lookup
form.
Lines 1 through 3 start nearly every CGI program I write. The -T
switch turns on ``taint mode'', so that input values are not
accidentally used to alter the external behavior of the program. The
-w
switch enables warnings. use strict
turns on the three
common compiler restrictions for larger programs: disabling symbolic
references, demanding that variables be properly introduced, and
disabling bare words as subroutine invocations. And finally, standard
output is unbuffered: not strictly necessary for this application, but
handy if we ever fork a child that might need to include things into
the output stream.
Lines 6 through 9 establish a list of abbreviations for the states. I'll use this in a pop-up box definition.
Lines 11 to 25 define the query fields of my form, including the SQL
snippets that should be used if they are present in the response.
Each element of @QUERY_FIELDS
is an arrayref. The first element is
the form field name. The second element is an SQL snippet, using the
question-mark placeholder. And a third element, if present, is a
coderef for the routine to validate the data, and alter it if
necessary. We'll see precisely how these fields are used later.
Line 28 pulls in the standard CGI module, along with importing all the form access and HTML generation shortcuts.
Line 30 prints the HTTP/CGI header, the beginnings of the HTML header,
and a first-order head of Query
. As this is not a complete
application, I didn't spend much time adorning the output.
Line 32 prints the tag for a horizontal rule, and the beginning of the
form. The form action parameter defaults to the same URL, bringing a
form submission right back to this same program, so this CGI script
both generates the original form, and processes the results. This is
a typical strategy for CGI.pm
-based programs.
Lines 34 to 42 print the form itself, with the layout aided by a simple table. A background color of a light-yellow defines the area of the form, designated in line 34. The first row picks up the first and last name, with their appropriate labels. The second row similarly holds the street address and the city.
The third row contains a popup field named state
. The list of
states comes from the array defined earlier. An additional item of
n/a
is added to the front of the list, and also selected as the
initial default. To keep this type of searching form easy to use, all
fields should default to their ``insignificant'' state. The third row
also contains a zip-code field. Of course, a zip code would also
uniquely define a city and state, but that's beyond the scope of this
example, so we'll pretend these are all independent fields. The third
row also contains the submit button.
For all three rows, the individual table cells are enclosed in a td
HTML shortcut, defined by CGI.pm
. And each of these cells are
further nested into a table row shortcut, defined as Tr
. This
shortcut has an unusual name because as an entirely lowercase word, it
would collide with the built-in tr
operator. Finally, the three
rows are enclosed inside a table
shortcut, with an additional
hashref parameter which becomes the attributes of the table
tag for
the output.
And finally, line 44 closes out the form with a horizontal rule.
When the form is initially displayed in the browser, I have a chance
to fill out one or more of the fields of my choice, and press submit.
The same program is reinvoked, but with access to the form
parameters using the param
function. The code beginning in line 46
handles this response.
Lines 47 and 48 hold the pieces of SQL and their corresponding values. Initially, they'll start out empty. If they're still empty after we've processed all the known fields, then we didn't get a valid query, and there's nothing further to do.
Lines 50 to 61 process each of the known form fields. $_
holds the
arrayref of each field in turn. Line 51 pulls out the value for that
particular parameter into the scalar $val
. (Note that this won't
work very well for a multi-value parameter, but that's merely minor
change to handle that instead.) If the parameter is not present,
we skip on to the next item.
Lines 52 to 58 clean up the data from that form element. If there's a
value in the third arrayref element, it's a coderef, which we call,
passing the form value as its first argument. We expect the called
subroutine to return either the cleaned-up data, or undef
, again
indicating that this field is ``not present''. If there's no special
handling subroutine, a simple editing pass is performed instead: line
55 deletes any leading whitespace, while line 56 deletes any trailing
whitespace.
Looking back at lines 18 to 23, we can see that the cleanup routine
for the popup menu takes that value and returns undef
unless it
contains a non-whitespace character but is not our default value of
n/a
.
Now, if we've made it to line 59, we have a good value in $val
, and
the corresponding SQL as the second element of the arrayref. So,
we'll push each one onto the end of the their corresponding arrays to
be gathered and processed later.
Once the individual form elements have been examined, it's time to
determine if we have a valid query. Line 62 examines the size of the
@sqls
array. If this is non-empty, then we've got to proceed to
building an SQL statement, in line 63.
Each snippet in @sqls
is a clause expected to be AND
-ed with the
other clauses. So we'll take the snippets and join
them with an
``and'' keyword in line 63. In front of that, we'll select the id
field from the resulting record for further processing.
In a real program, we'd then execute code similar to that presented in lines 69 to 71. The SQL statement would be prepared from the joined snippets, and then the placeholders would be bound to their corresponding values, and finally we'd be getting the matched records by their ids.
However, since this is just a demo harness, we'll dump out the SQL in line 65, and the list of values for the binding in line 66.
So, we've seen how to walk through a form, picking out the fields and constructing the corresponding SQL for them, and we've also seen the importance of using placeholders. And with no more worries about broken or malicious input values messing up your day, until next time, enjoy!
Listing
=1= #!/usr/bin/perl -Tw =2= use strict; =3= $|++; =4= =5= ## constants =6= my @STATES = qw( =7= AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS =8= MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY =9= ); =10= ## configuration =11= my @QUERY_FIELDS = =12= ( =13= ['firstname', 'person.firstname is ?'], =14= ['lastname', 'person.lastname is ?'], =15= ['street', 'person.street is ?'], =16= ['city', 'person.city is ?'], =17= ['state', 'person.state is ?', =18= sub { =19= my $v = shift; =20= return undef unless $v =~ /\S/; =21= return undef if $v eq "n/a"; =22= $v; =23= }], =24= ['zip', 'person.zip is ?'], =25= ); =26= ## end configuration =27= =28= use CGI qw(:all); =29= =30= print header, start_html, h1('Query'); =31= =32= print hr, start_form; =33= =34= print table({-bgcolor => '#ffff88'}, =35= Tr(td("First:"), td(textfield('firstname')), =36= td("Last:"), td(textfield('lastname'))), =37= Tr(td("Street:"), td(textfield('street')), =38= td("City:"), td(textfield('city'))), =39= Tr(td("State:"), =40= td(popup_menu('state', ['n/a', @STATES], 'n/a')), =41= td("Zip:"), td(textfield('zip', '', 10)), =42= td(submit))); =43= =44= print end_form, hr; =45= =46= { =47= my @sqls; =48= my @vals; =49= =50= for (@QUERY_FIELDS) { =51= next unless defined(my $val = param($_->[0])); =52= if ($_->[2]) { =53= next unless defined($val = $_->[2]->($val)); =54= } else { =55= $val =~ s/^\s+//; =56= $val =~ s/\s+$//; =57= next unless length $val; =58= } =59= push @sqls, $_->[1]; =60= push @vals, $val; =61= } =62= if (@sqls) { =63= my $sql = "select id from person where ".join(" and ", @sqls); =64= print h2("Resulting query"); =65= print pre(escapeHTML($sql)); =66= print pre(escapeHTML(join ", ", map { qq{"$_"} } @vals)); =67= =68= ## typical use: =69= ## my $sth = $dbh->prepare($sql); =70= ## $sth->execute(@vals); =71= ## while (my ($id) = $sth->fetchrow_array) { ... } =72= } =73= } =74= =75= print end_html;