Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Perl Journal magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Perl Journal Column 08 (Jan 2004)
[Suggested title: ``Evaluating short-circuited boolean expressions'']
In my recently released File::Finder
module, one of the more
interesting problems I found myself solving was how to evaluate the
boolean expression resulting from the find-like predicates. The
find command evaluates the various tests using a combination of
AND
, OR
, NOT
, and parentheses operators with the traditional
precedence rules, including short circuiting. For example:
find /tmp -size +30 -atime +3 -print
In this case, find first tries the size test. Only if that succeeds,
the implied AND
operator then tests the access time. Again, only
if that succeeds, do we finally evaluate the print operation, which
always returns true. Had we included an OR
operator in the chain:
find /tmp -size +30 -atime +3 -o -print
then the rules for short-circuiting an OR
apply, namely that if the
file is both big enough and old enough, the left side of the OR
is
true, so we do not execute the right side, and the printing is
skipped. Similarly, if we move the OR
:
find /tmp -size +30 -o -atime +3 -print
then the implied AND
between the time rule and the printing rule
now binds tighter, so we'll print only small old files now, instead of
big ones.
In File::Finder
, I chose to represent an expression such as this
using an array of individual steps, with coderefs for the actual
tests, and simple strings for the operators (the AND
operator
always being implied). So, the above test would look something like:
my @steps = ( \&code_for_testing_size_greater_than_30, "OR", \&code_for_testing_atime_greater_than_3, \&code_for_printing, );
In practice, these coderefs were generated by closures that held the magic 30 and 3 constants in closure variables, but let's not get distracted here. The code to evaluate a series of coderefs, stopping at the first false coderef, started out looking something like this:
my $state = 1; # 1 = true, 0 = false for (@steps) { if ($state) { # should we execute this? my $result = $_->(); # run the coderef unless ($result) { # did this fail? $state = 0; # result is false } } }
This ensures that we stop running steps when the first step
fails. This is our implied AND
operator. The reason we keep going
rather than exiting the loop is that even if we end up in a false
state, we need to notice if there's an OR
operator. This
complicates things a bit. We'll need to introduce a third state,
skipping. Why? Because we now have three states:
true AND ... # we execute this, thus true state false AND ... # we don't execute this, thus false state false OR ... # we do execute this, thus true state true OR .. # we don't execute this, but... true OR ... OR ... # we don't want the third part to execute!
So, after the OR
is seen in a true state, we need to enter
a sticky false state. Hence, skipping.
my $state = 1; # 1 = true, 0 = false, -1 = skipping for (@steps) { if (ref $_) { # is it a coderef? if ($state > 0) { # should we execute this? my $result = $_->(); # run the coderef unless ($result) { # did this fail? $state = 0; # result is false } } } elsif ($_ eq "OR") { if ($state > 0) { # true becomes skipping $state = -1; } elsif ($state == 0) { # false becomes true $state = 1; } } }
So, an OR
causes true to become skipping, false to become true, and
skipping to stay skipping. If you work through this, you'll see this
3-state decision tree now properly handles the implied AND
and the
explicit OR
operator.
But wait, there's more. Let's add a NOT
prefix operator into the
mix, represented as a "NOT"
string for a step. And let's presume
that we can stack them. I found the easiest way to handle that was to
introduce a fourth state, ``true (but invert the next test)'', encoded
as 2
in $state
. Now we get something more complicated again:
my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not" for (@steps) { if (ref $_) { # is it a coderef? if ($state > 0) { # should we execute this? my $result = $_->(); # run the coderef $result = !$result if $state > 1; # flip result if needed $state = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state > 1; if ($state > 0) { # true becomes skipping $state = -1; } elsif ($state == 0) { # false becomes true $state = 1; } } elsif ($_ eq "NOT") { # "true" and "not" swap states if ($state == 2) { $state = 1; } elsif ($state == 1) { $state = 2; } } }
Note that if a NOT
was seen before a coderef, we need to
essentially act on the reverse of the outcome of the coderef, so we
negate the result logically. Also note that it'd be wrong to copy
$result
into $state
there, because we don't demand that the
coderefs return a simple 1
or 0
: they can return any true/false
value, so we have to normalize the results with a small question-colon
operator.
It's also tempting to flip between the 2
and 1
in the last
elsif
block by trying something clever like:
$state = 3 - $state;
until you realize that $state
could also be 0 or -1, and we're
trying to leave those alone. It's always better to do some explicit
tests than to try to perform fancy math to get it to work right. The
one notable exception I've seen to that is the AM/PM modulator to
convert a 24-hour hour into a 12-hour hour:
my $hour_12 = ($hour_24 + 11) % 12 + 1;
although the magical ``11'' constant there deserves a well-placed comment in the source.
At this point, we have a full expression evaluator for AND
, OR
,
and NOT
operators with proper relative precedence. Let's add the
"COMMA"
operator now, from GNU find. This has the effect of
restoring a true-state in the expression, ignoring any previous values
(although their side-effect has already been acted upon). Seems
simple enough:
my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not" for (@steps) { if (ref $_) { # is it a coderef? if ($state > 0) { # should we execute this? my $result = $_->(); # run the coderef $result = !$result if $state > 1; # flip result if needed $state = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state > 1; if ($state > 0) { # true becomes skipping $state = -1; } elsif ($state == 0) { # false becomes true $state = 1; } } elsif ($_ eq "NOT") { # "true" and "not" swap states if ($state == 2) { $state = 1; } elsif ($state == 1) { $state = 2; } } elsif ($_ eq "COMMA") { die "NOT before COMMA!" if $state > 1; $state = 1; # restore true } }
Now we're happy. We're dealing with NOT
, AND
, OR
, and
COMMA
. At least, we remain happy, until we remember that it's time
to look at how to handle parentheses!
Shaking loose some old cobwebs in my brain, I remember an expression evaluator I saw some 30 years ago in a textbook that used a stack to handle the nested subexpressions. So, I took the same tactic here. (No doubt Mark Jason-Dominus will write me and tell me that a stack wasn't needed here provided I applied clever trick N, but as I am a bear-of-very-little-brain, I'll take the straightforward approach.)
So, we'll do that by changing $state
into @state
, and
every place we had $state
before, we'll simply use $state[-1]
(the
top of the stack is the highest element. Without changing the functionality
(or adding the paren-handling), that gets us this code:
my @state = (1); # start as true for (@steps) { if (ref $_) { # is it a coderef? if ($state[-1] > 0) { # should we execute this? my $result = $_->(); # run the coderef $result = !$result if $state[-1] > 1; # flip result if needed $state[-1] = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state[-1] > 1; if ($state[-1] > 0) { # true becomes skipping $state[-1] = -1; } elsif ($state[-1] == 0) { # false becomes true $state[-1] = 1; } } elsif ($_ eq "NOT") { # "true" and "not" swap states if ($state[-1] == 2) { $state[-1] = 1; } elsif ($state[-1] == 1) { $state[-1] = 2; } } elsif ($_ eq "COMMA") { die "NOT before COMMA!" if $state[-1] > 1; $state[-1] = 1; # restore true } }
A little messier, but no change in functionality. Next, we have to figure out how the state changes as we enter and leave a parenthesized area. First, let's see what happens when we enter a parenthesized area:
not ( ... now true true ( ... now true false ( ... now skipping skipping ( ... now skipping
So, on a left paren, if $state[-1]
is greater than 0, we push a
1
, otherwise we push a -1
. That's easy enough. The harder part
is what to do at the end of the subexpression. The false or skipping
value in front of the subexpression won't change at the end, so we can
write that off right away. But we have six other combinations to
consider: all the combinations of not or true before the subexpression
crossed with true, false, and skipping at the end of the
subexpression. Working through the combinations, we find that the
rules are:
not ( ... true ) ... now false not ( ... false ) ... now true not ( ... skipping ) ... now false true ( ... true ) ... now true true ( ... false ) ... now false true ( ... skipping ) ... now true
So, it looks like we can take true and skipping xor'ed with whether there's a not-state in front of the subexpression. The one other thing is that comma no longer restores always to true: it restore to whatever was set up at the start of the subexpression. Adding that code back in to the engine gives us the final result:
my @state = (1); for (@steps) { if (ref $_) { if ($state[-1] > 0) { my $result = $_->(); $result = !$result if $state[-1] > 1; $state[-1] = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state[-1] > 1; if ($state[-1] > 0) { $state[-1] = -1; } elsif ($state[-1] == 0) { $state[-1] = 1; } } elsif ($_ eq "NOT") { if ($state[-1] == 2) { $state[-1] = 1; } elsif ($state[-1] == 1) { $state[-1] = 2; } } elsif ($_ eq "LEFT") { push @state, ($state[-1] > 0 ? 1 : -1); } elsif ($_ eq "RIGHT") { die "RIGHT without LEFT!" unless @state > 1; die "NOT before RIGHT!" if $state[-1] > 1; my $child = pop @state; $child = !$child if $state[-1] > 1; # reverse for NOT $state[-1] = ($child ? 1 : 0) if $state[-1] > 0; # inherit state } elsif ($_ eq "COMMA") { die "NOT before COMMA!" if $state[-1] > 1; if (@state > 1) { # in subexpression, restore to initial value $state[-1] = ($state[-2] > 0 ? 1 : -1); } else { $state[-1] = 1; # restore true if at outer level } } }
Whew! I hope you enjoyed walking through this logic as much as I did
in creating it. And if not, aren't you happy it's all nicely
encapsulated in File::Finder
, so that you don't have to figure it
out? Until next time, enjoy!