Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Perl Journal magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Perl Journal Column 08 (Jan 2004)

[Suggested title: ``Evaluating short-circuited boolean expressions'']

In my recently released File::Finder module, one of the more interesting problems I found myself solving was how to evaluate the boolean expression resulting from the find-like predicates. The find command evaluates the various tests using a combination of AND, OR, NOT, and parentheses operators with the traditional precedence rules, including short circuiting. For example:

  find /tmp -size +30 -atime +3 -print

In this case, find first tries the size test. Only if that succeeds, the implied AND operator then tests the access time. Again, only if that succeeds, do we finally evaluate the print operation, which always returns true. Had we included an OR operator in the chain:

  find /tmp -size +30 -atime +3 -o -print

then the rules for short-circuiting an OR apply, namely that if the file is both big enough and old enough, the left side of the OR is true, so we do not execute the right side, and the printing is skipped. Similarly, if we move the OR:

  find /tmp -size +30 -o -atime +3 -print

then the implied AND between the time rule and the printing rule now binds tighter, so we'll print only small old files now, instead of big ones.

In File::Finder, I chose to represent an expression such as this using an array of individual steps, with coderefs for the actual tests, and simple strings for the operators (the AND operator always being implied). So, the above test would look something like:

  my @steps = (
    \&code_for_testing_size_greater_than_30,
    "OR",
    \&code_for_testing_atime_greater_than_3,
    \&code_for_printing,
  );

In practice, these coderefs were generated by closures that held the magic 30 and 3 constants in closure variables, but let's not get distracted here. The code to evaluate a series of coderefs, stopping at the first false coderef, started out looking something like this:

  my $state = 1; # 1 = true, 0 = false
  for (@steps) {
    if ($state) { # should we execute this?
      my $result = $_->(); # run the coderef
      unless ($result) { # did this fail?
        $state = 0; # result is false
      }
    }
  }

This ensures that we stop running steps when the first step fails. This is our implied AND operator. The reason we keep going rather than exiting the loop is that even if we end up in a false state, we need to notice if there's an OR operator. This complicates things a bit. We'll need to introduce a third state, skipping. Why? Because we now have three states:

  true AND ... # we execute this, thus true state
  false AND ... # we don't execute this, thus false state
  false OR ... # we do execute this, thus true state
  true OR .. # we don't execute this, but...
  true OR ... OR ... # we don't want the third part to execute!

So, after the OR is seen in a true state, we need to enter a sticky false state. Hence, skipping.

  my $state = 1; # 1 = true, 0 = false, -1 = skipping
  for (@steps) {
    if (ref $_) { # is it a coderef?
      if ($state > 0) { # should we execute this?
        my $result = $_->(); # run the coderef
        unless ($result) { # did this fail?
          $state = 0; # result is false
        }
      }
    } elsif ($_ eq "OR") {
      if ($state > 0) { # true becomes skipping
        $state = -1;
      } elsif ($state == 0) { # false becomes true
        $state = 1;
      }
    }
  }

So, an OR causes true to become skipping, false to become true, and skipping to stay skipping. If you work through this, you'll see this 3-state decision tree now properly handles the implied AND and the explicit OR operator.

But wait, there's more. Let's add a NOT prefix operator into the mix, represented as a "NOT" string for a step. And let's presume that we can stack them. I found the easiest way to handle that was to introduce a fourth state, ``true (but invert the next test)'', encoded as 2 in $state. Now we get something more complicated again:

  my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not"
  for (@steps) {
    if (ref $_) { # is it a coderef?
      if ($state > 0) { # should we execute this?
        my $result = $_->(); # run the coderef
        $result = !$result if $state > 1; # flip result if needed
        $state = ($result ? 1 : 0);
      }
    } elsif ($_ eq "OR") {
      die "NOT before OR!" if $state > 1;
      if ($state > 0) { # true becomes skipping
        $state = -1;
      } elsif ($state == 0) { # false becomes true
        $state = 1;
      }
    } elsif ($_ eq "NOT") {
      # "true" and "not" swap states
      if ($state == 2) {
        $state = 1;
      } elsif ($state == 1) {
        $state = 2;
      }
    }
  }

Note that if a NOT was seen before a coderef, we need to essentially act on the reverse of the outcome of the coderef, so we negate the result logically. Also note that it'd be wrong to copy $result into $state there, because we don't demand that the coderefs return a simple 1 or 0: they can return any true/false value, so we have to normalize the results with a small question-colon operator.

It's also tempting to flip between the 2 and 1 in the last elsif block by trying something clever like:

  $state = 3 - $state;

until you realize that $state could also be 0 or -1, and we're trying to leave those alone. It's always better to do some explicit tests than to try to perform fancy math to get it to work right. The one notable exception I've seen to that is the AM/PM modulator to convert a 24-hour hour into a 12-hour hour:

  my $hour_12 = ($hour_24 + 11) % 12 + 1;

although the magical ``11'' constant there deserves a well-placed comment in the source.

At this point, we have a full expression evaluator for AND, OR, and NOT operators with proper relative precedence. Let's add the "COMMA" operator now, from GNU find. This has the effect of restoring a true-state in the expression, ignoring any previous values (although their side-effect has already been acted upon). Seems simple enough:

  my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not"
  for (@steps) {
    if (ref $_) { # is it a coderef?
      if ($state > 0) { # should we execute this?
        my $result = $_->(); # run the coderef
        $result = !$result if $state > 1; # flip result if needed
        $state = ($result ? 1 : 0);
      }
    } elsif ($_ eq "OR") {
      die "NOT before OR!" if $state > 1;
      if ($state > 0) { # true becomes skipping
        $state = -1;
      } elsif ($state == 0) { # false becomes true
        $state = 1;
      }
    } elsif ($_ eq "NOT") {
      # "true" and "not" swap states
      if ($state == 2) {
        $state = 1;
      } elsif ($state == 1) {
        $state = 2;
      }
    } elsif ($_ eq "COMMA") {
      die "NOT before COMMA!" if $state > 1;
      $state = 1; # restore true
    }
  }

Now we're happy. We're dealing with NOT, AND, OR, and COMMA. At least, we remain happy, until we remember that it's time to look at how to handle parentheses!

Shaking loose some old cobwebs in my brain, I remember an expression evaluator I saw some 30 years ago in a textbook that used a stack to handle the nested subexpressions. So, I took the same tactic here. (No doubt Mark Jason-Dominus will write me and tell me that a stack wasn't needed here provided I applied clever trick N, but as I am a bear-of-very-little-brain, I'll take the straightforward approach.)

So, we'll do that by changing $state into @state, and every place we had $state before, we'll simply use $state[-1] (the top of the stack is the highest element. Without changing the functionality (or adding the paren-handling), that gets us this code:

  my @state = (1); # start as true
  for (@steps) {
    if (ref $_) { # is it a coderef?
      if ($state[-1] > 0) { # should we execute this?
        my $result = $_->(); # run the coderef
        $result = !$result if $state[-1] > 1; # flip result if needed
        $state[-1] = ($result ? 1 : 0);
      }
    } elsif ($_ eq "OR") {
      die "NOT before OR!" if $state[-1] > 1;
      if ($state[-1] > 0) { # true becomes skipping
        $state[-1] = -1;
      } elsif ($state[-1] == 0) { # false becomes true
        $state[-1] = 1;
      }
    } elsif ($_ eq "NOT") {
      # "true" and "not" swap states
      if ($state[-1] == 2) {
        $state[-1] = 1;
      } elsif ($state[-1] == 1) {
        $state[-1] = 2;
      }
    } elsif ($_ eq "COMMA") {
      die "NOT before COMMA!" if $state[-1] > 1;
      $state[-1] = 1; # restore true
    }
  }

A little messier, but no change in functionality. Next, we have to figure out how the state changes as we enter and leave a parenthesized area. First, let's see what happens when we enter a parenthesized area:

  not ( ... now true
  true ( ... now true
  false ( ... now skipping
  skipping ( ... now skipping

So, on a left paren, if $state[-1] is greater than 0, we push a 1, otherwise we push a -1. That's easy enough. The harder part is what to do at the end of the subexpression. The false or skipping value in front of the subexpression won't change at the end, so we can write that off right away. But we have six other combinations to consider: all the combinations of not or true before the subexpression crossed with true, false, and skipping at the end of the subexpression. Working through the combinations, we find that the rules are:

  not ( ... true ) ... now false
  not ( ... false ) ... now true
  not ( ... skipping ) ... now false
  true ( ... true ) ... now true
  true ( ... false ) ... now false
  true ( ... skipping ) ... now true

So, it looks like we can take true and skipping xor'ed with whether there's a not-state in front of the subexpression. The one other thing is that comma no longer restores always to true: it restore to whatever was set up at the start of the subexpression. Adding that code back in to the engine gives us the final result:

  my @state = (1);
  for (@steps) {
    if (ref $_) {
      if ($state[-1] > 0) {
        my $result = $_->();
        $result = !$result if $state[-1] > 1;
        $state[-1] = ($result ? 1 : 0);
      }
    } elsif ($_ eq "OR") {
      die "NOT before OR!" if $state[-1] > 1;
      if ($state[-1] > 0) {
        $state[-1] = -1;
      } elsif ($state[-1] == 0) {
        $state[-1] = 1;
      }
    } elsif ($_ eq "NOT") {
      if ($state[-1] == 2) {
        $state[-1] = 1;
      } elsif ($state[-1] == 1) {
        $state[-1] = 2;
      }
    } elsif ($_ eq "LEFT") {
      push @state, ($state[-1] > 0 ? 1 : -1);
    } elsif ($_ eq "RIGHT") {
      die "RIGHT without LEFT!" unless @state > 1;
      die "NOT before RIGHT!" if $state[-1] > 1;
      my $child = pop @state;
      $child = !$child if $state[-1] > 1; # reverse for NOT
      $state[-1] = ($child ? 1 : 0) if $state[-1] > 0; # inherit state
    } elsif ($_ eq "COMMA") {
      die "NOT before COMMA!" if $state[-1] > 1;
      if (@state > 1) { # in subexpression, restore to initial value
        $state[-1] = ($state[-2] > 0 ? 1 : -1);
      } else {
        $state[-1] = 1; # restore true if at outer level
      }
    }
  }

Whew! I hope you enjoyed walking through this logic as much as I did in creating it. And if not, aren't you happy it's all nicely encapsulated in File::Finder, so that you don't have to figure it out? Until next time, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.