Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 33 (Aug 2000)
[suggested title: Taint so easy, is it?]
If you've been reading my columns for any length of time, you've probably seen me mention ``taint mode'', usually briefly while I'm describing a ``hash-bang'' line of something like:
#!/usr/bin/perl -Tw
which turns on warnings (the -w
) and ``taint mode'' (the -T
). But
what is taint mode?
Taint mode is a security feature of Perl, and includes two levels of
operation. First, while taint mode is in effect, some operations are
forbidden. One of these is that $ENV{PATH}
cannot contain any
world-writeable directories when firing off a child process (like with
backticks or system
). Should your program attempt an unsafe
action, the program aborts (via die
) immediately, before the action
has a chance to create a potential security violation. You could have
included code to check this yourself, but by having Perl perform the
checks ensures a consistency and a ``best practices'' level of
competence that you may not have the capability or resources to
include explicitly.
The second level of operation is much more interesting and unique to Perl (amongst all the popular languages I know of), in which Perl keeps track of a ``distrust'' of each scalar value in the program. Every item of data coming from input sources (command line arguments, environment variables, locale information, some system calls, and all file input) is marked ``tainted''.
For example, the following operations all generate tainted data:
$t1 = <STDIN>; $t2 = $ENV{USER}; $t3 = $ARGV[2]; @t4 = <*.txt>;
In each of these examples, the data has come ``from the outside world'', and is therefore treated as potentially dangerous. Once data is tainted, the taint propogates to any data derived from the tainted data:
$t5 = $t4[0]; $t6 = "/home/$t2"; chomp($t1); @x = ("help", "me", $t3, "please");
Note that tainting is on a per-scalar basis. So $x[2]
is tainted,
not the entire array @x
.
Once data is marked tainted, nearly any attempt to use the data to
affect the outside world will be blocked, causing an immediate die
with a taint violation. For example, invoking rename
where either
the source name or destination name is tainted is considered
dangerous. This permits normal operations:
rename $x[0], $x[1];
But not operations that involve tainted data (recall that $x[2]
is
tainted from earlier):
rename $x[0], $x[2];
What this means is that data that comes in from the outside world cannot trivially affect the outside world as well. Why is this important?
Well, the typical use of taint mode is to enable programs that act on
behalf of other users to operate in a safer manner. For example, a
``setuid'' or ``setgid'' program borrows the privileges of its owner for
the duration of execution, allowing an ordinary user to act as root
(or some other user) for a selected set of operations. Or a CGI
program, executing as the web server ID (typically nobody
), is
acting with that user's privileges on behalf of a request from any web
client, generally without direct access to the server except through
the web server.
In both of these cases, it's important that input data be checked so as not to permit the user who invokes the program from borrowing the privileges of the executing user ID to perform unintended actions.
For example, it'd be pretty dangerous to rename a file based on the input from a CGI form:
use CGI qw(param); ... my $source = param('source'); my $dest = param('destination'); rename $source, $dest;
Now perhaps the author of this CGI script believed that since the form
contained only radio buttons or pop-up menus that were clearly defined
that this would be a safe program. But in reality, a person with
intent to damage or break in could just as easily invoke this script
passing arbitrary data in source
and destination
, and
potentially rename any file to which the web userid has access!
With taint mode enabled, the CGI parameters (having been derived from
either reading STDIN
or an environment variable) are marked
tainted, and therefore the rename
operation would fail before it
has committed potential damage. (To enable taint mode on a CGI
script, just include -T
in the #!
line, as shown earlier.) And
that's exactly the safest thing to do here.
But obviously, there are times when input data must in fact
legitimately affect the outside world. Here's where the next feature
of taint mode comes in. As a sole exception, the results of a regular
expression memory reference (usually accessed as the numeric variables
like $1
and $2
and so on) are never tainted, even though the
match may have been performed on tainted data. This gives us
the ``carefully guarded gate in the fence'', when used properly. For
example:
my $source = param('source'); unless ($source =~ /^(gilligan|skipper|professor)$/) { die "unexpected source $source\n"; } $safe_source = $1;
Here, $source
is expected to be one of gilligan
, skipper
, or
professor
. If not, we'll die before executing the next statement,
which copies the captured memory into $safe_source
. (Note the
parens in the regular expression match are performing double duty,
needed for both proper precedence regarding the vertical bar and the
beginning and ending of string anchors, as well as having the
side-effect of setting up the first backreference memory. Sometimes,
you get lucky.)
The value of $safe_source
is now legitimate to be used in the
rename
operation earlier, as it came from a regular expression
memory, and not directly from input data. In fact, we could even
have assigned it back over $source
(a common thing to do):
$source = $1; # source now untainted
Of course, we'd have to perform a similar operation on $destination
to complete the operation.
So, if someone attempts to give us an incorrect value for the source
parameter, like ginger
, the program aborts. Certainly, this
program would have aborted with or without taint mode, but in taint
mode it works only because we added the extra code to perform a
regular expression match, during which we needed to think about what
the possible legal values for the string might have been.
And that brings up the next point: we typically can't perform an explicit match against a known list of values. More often, the data is a user specified value that needs to fit a general description, but again, regular expressions are pretty good at matching many things.
So, let's say the $source
there came from a text field box, rather
than a pop-up menu, permitting an arbitrary string. How do we pass
that along to the rename
operator? Well, first we have to decide
what a legimate string might be. For example, let's restrict to filenames
that contain only \w
-matching characters, including a dot (as long
as the dot is not the first character). That'd be like this:
$source = param('source'); $source =~ /^(\w[\w.]*)$/ or die; $source = $1;
Once again, if the string is not as expected, we die. And only if we
haven't died will we continue on to use $1
which has now been
verified to be a name of the form that we expect.
Note that it's very imporant to test the result of the regular
expression match, because $1
(and the other memory variables) is
set only when you have a successful regular expression match. Otherwise,
you get an earlier match, and that's definitely bad news:
## bad code do not use ## $param('source') =~ /^(\w[\w.]*)$/; $source = $1; ## bad code do not use ##
A slightly more compact way of writing this correctly might be:
my ($source) = param('source') =~ /^(\w[\w.]*)$/ or die "bad source";
Here, I'm using $1
implicitly as the list context result of the
regular expression match, and declaring the variable that will hold
it, and checking for errors, all in one compact statement.
The regular expression pattern should be as restrictive as you can
get. For example, if you use something like /(.*)/s
, you've
effectly removed any of the benefits of taint mode for that particular
data, making it potentially possible for someone to hijack your
program in unintended ways.
So, I hope this gives you a bit of insight into how to use taint mode,
and why it is useful. If this column 'taint enough for you, I suggest
you check out the perlsec
manpage (perhaps using the command
perldoc perlsec
at a prompt). Until next time, enjoy your new
security knowledge.