Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 58 (may 2005)

[suggested title: ``Avoiding SQL Injection Attacks'']

In recent months, the entertainment news was abuzz with how socialite Paris Hilton had had her Sidekick phone ``hacked'', revealing phone numbers for many famous people, interesting notes about possible profit participation in the ``stolen'' sex tape, and lurid pictures, all of which were broadcast widely on the Internet.

But the quiet word amongst computer security professionals was that this wasn't a matter of hacking into her phone at all, not that the entertainment news people would know or care. Instead, certain individuals have claimed responsibility through a well-known security flaw on T-Mobile's web site, with which the phone is synchronized. Thus, it wasn't the phone as the source of information: it was the website.

While the precise steps by which the intruder obtained access may never be known, one of the most common intrusion paths is the SQL Injection Attack. An intruder either guesses by trial and error, or by examining the source code of the application, that some field in a form is being used to construct an SQL query. Unless the application carefully verifies the content of this field, or quotes the unusual characters properly, it might be possible for the field's value to escape beyond its intended use, and create additional SQL constructs.

Let's look at a specific example. Suppose a bank web application is trying to show all the bank account balances that belong to me. The SQL might look like:

  SELECT account_id, balance
  FROM account_data
  WHERE account_owner = '[ME]';

Here, the [ME] comes from some other part of the calculation, which isn't relevant. The effect is that I see only my data. Now, let's say that the web programmers at the bank upgrade the application so that I can distinguish my personal and business accounts via a pop-up form element, which returns back a ``1'' for personal and ``2'' for business. So, they change this SQL to be:

  SELECT account_id, balance
  FROM account_data
  WHERE account_owner = '[YOU]'
  AND account_type = $account_type;

Of course, they presume that the values ``1'' and ``2'' will drop in quite nicely there, and the application tests out just fine, so the programmers move on to the next item in their long to-do list.

What many web programmers fail to realize is that just because a pop-up form specifies a list of values or a textfield provides a maximum length (even enforced by javascript), this doesn't mean that a user can't construct an arbitrary value for any returned form parameter. Of course, changing this value to ``3'' won't do me much good. I won't get anything at all, since there aren't any accounts with a type of 3.

But suppose I jimmy up a return value so that $account_type comes back as: 1 OR 99 = 99. Let's fill that in:

  SELECT account_id, balance
  FROM account_data
  WHERE account_owner = '[YOU]'
  AND account_type = 1 OR 99 = 99;

Oops! What just happened? Since the OR has a lower precedence than the AND, we're saying essentially ``true'' for every row of the table. Presuming we have the patience to page through the result, we now have everyone's account numbers and balances. Ooops.

There are many other variants on SQL injection attacks. You might be rather shocked when you begin a web search for further information.

But let's cut right to right way to do this, with DBI placeholders. The DBI interface permits a chunk of SQL to contain question marks wherever data is being used. For example, the SQL above could be written as:

  my $sth = $dbh->prepare(q{
    SELECT account_id, balance
    FROM account_data
    WHERE account_owner = ?
    AND account_type = ?
  });

This takes an existing database handle (here, $dbh), and creates a statement handle ($sth) that is safely protected from SQL injection attacks. We use this handle by calling execute, providing the values to be inserted:

  $sth->execute('ABC 123', 2);

The DBI system goes through the data, figuring out how to quote the values properly for the corresponding database. For example, the statement being executed will likely end up being something like:

    SELECT account_id, balance
    FROM account_data
    WHERE account_owner = 'ABC 123'
    AND account_type = 2

Note that the string was enclosed in single quotes, but the number remained unquoted. (This is important for databases that respect data types.) Now, let's see what happens if we try to pass that complex string as the account type number:

  $sth->execute('ABC 123', '1 OR 99 = 99');

This results in the effective SQL of:

    SELECT account_id, balance
    FROM account_data
    WHERE account_owner = 'ABC 123'
    AND account_type = '1 OR 99 = 99'

Ahh, look! The quotes enclose the now-apparent string value. While the query is likely to fail (no account type will match that), we didn't get to see everyone's account balances either. Success.

The execute method should get one value for every placeholder in the query string. The undef value is properly mapped as NULL, while numbers and strings pass through properly quoted for the active database. Certain types may require hinting as to the actual data type: see the DBI documentation for further information.

One common problem when using placeholders is that we must maintain a correlation between chunks of SQL and the actual values we want to plug in. Good discipline helps here, but there's a nice module in the CPAN called SQL::Abstract that helps us say what we intend with a bit more clarity.

One feature of SQL::Abstract is that a particular WHERE clause can be represented by relatively straightforward complex data structure. For example, AND-ing two conditions is represented by a hashref:

  { account_owner => $owner, account_type => $account_type }

Here, $owner contains our <ABC 123> and $account_type is 1 or 2.

Using SQL::Abstract, we can convert this to the SQL to feed to prepare, and the bind parameters to feed to execute with one simple step:

  use SQL::Abstract;
  my $sqa = SQL::Abstract->new;

  my ($owner, $account_type) = ('ABC 123', 2); # from inputs

  my ($sql, @bind) = $sqa->select(
    'account_data', # table
    [qw(account_id balance)], # fields
    {
     account_owner => $owner,
     account_type => $account_type
    }, # "where"
  );

  my $sth = $dbh->prepare_cached($sql); # reuse SQL if we can
  $sth->execute(@bind); # execute it for this query

The result is a good query, with good bind values in the right order. And we're still safe from SQL injection attacks. If we also wanted to add account_status as not in suspended or deleted, we could change the query to:

  {
    account_owner => $owner,
    account_type => $account_type,
    account_status => { -not_in => ['suspended', 'deleted'] },
  }

In this case, the corresponding SQL looks like:

  SELECT account_id, balance
  FROM account_data WHERE
  ( account_owner = ?
    AND account_status NOT IN ( ?, ? )
    AND account_type = ? )

with bind values of:

  'ABC 123', 'suspended', 'deleted', 2

Note that the ordering of the hash elements affects the output, but the bind values did indeed line up with the corresponding placeholders.

I won't repeat the entire manpage for SQL::Abstract here, but it looks sufficient for most of the queries I've ever wanted to create in SQL.

I can even use SQL::Abstract with my Class::DBI-wrapped databases using Class::DBI::AbstractSearch. Mixing in this class to my Class::DBI-derived base class gives me the search_where clause, accepting that same data structure. For example:

  my @results = My::DB::AccountData->search_where({
    account_owner => $owner,
    account_type => $account_type,
    account_status => { -not_in => ['suspended', 'deleted'] },
  });

And there, no SQL to write at all, again, and I'm safe from SQL Injection Attacks.

I hope you find SQL::Abstract useful, and never have your name appear in a headline regarding a bad security breakin. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 58 (may 2005)