Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 38 (Jun 2001)

[suggested title: ``It's all about context'']

Just recently, an article on the Slashdot (www.slashdot.org) discussed the surprising story of a high-school student who had made a small Perl programming mistake that got him into a big amount of trouble. On his dynamically generated webpage, he had used the code:

  my($f) = `fortune`;

when what he should have done was:

  my $f = `fortune`;

Now, both of these invoke the fortune program, capturing its random quip of text. In this particular case, when the school administrators had visited the boy's page, fortune had selected a quote from a William Gibson novel:

 I put the shotgun in an Adidas bag and padded it out with four pairs of tennis
 socks, not my style at all, but that was what I was aiming for:  If they think
 you're crude, go technical; if they think you're technical, go crude.  I'm a
 very technical boy.  So I decided to get as crude as possible.  These days,
 though, you have to be pretty technical before you can even aspire to 
 crudeness.
 - Johnny Mnemonic, by William Gibson

Now, if you can't tell what would be in $f for both of the code fragments above, read on, and you'll see how an unwitting mistake can leave someone with an unexpected police file.

The problem is a matter of context (in more ways than one). Perl's operators are ``context sensitive'', in that the operator can detect whether it is being used in a place looking for a scalar rather than looking for a list, and return an appropriate result. In this case, the backtick operator returns a differing result, depending on whether it was invoked in a scalar context or a list context.

To understand this, let's first look at how to detect context. Starting with the basics, the right-hand side of an assignment to a scalar variable must be a scalar value:

  $a = ...

Whatever's over there on the right, it's got to be a scalar value, because that's the only thing that'll fit into a scalar variable.

Similarly, the right side of an assigment to an array can be any list value:

  @b = ...

Let's put some things in both places and see how it differs. One that you are almost certainly familar with is the readline operator, spelled ``less-than filehandle greater-than'':

  $a = <STDIN>

In this ``scalar context'', the readline operator returns the next line to be read, or undef if an I/O error occurs (such as at end-of-file).

However, the very same operator and punctuation in a ``list context'' yields all the remaining lines until end-of-file is reached, or an empty list if already at end of file:

  @b = <STDIN>

Now, Larry Wall could have come up with two different operators for these two similar operations, but by making the operator ``context sensitive'', we get the savings of brainspace and keyboardspace. Apparently, we humans are fairly good at grokking context, so why not leverage off that a bit in the language?

Similarly, the matching operator in a scalar context returns a success value:

  $a = /(\w+) (\d+)/

which is true if the regular expression matches $_ and false otherwise. If the result was true, we'd look in $1 for the word, and $2 for the digit string. A shorter way to do the same thing though is to use the same regular expression in a list context:

  @b = /(\w+) (\d+)/

And now, the regular expression match operator is not returning true/false, but rather a list of two items (the two memories) or an empty list if the match was not successful. So $b[0] ends up with the word, and $b[1] gets the digit-string.

In both of these operators, the scalar interpretation and the list interpretation are related, but not by any predictable formula. That's the way it is in general. You can't apply a general rule, except that there are no rules. It's whatever Larry thought would be the most practical and useful, and least surprising (well, least surprising to Larry).

A few more examples to keep getting our feet progressively more wet, and then we'll look some more at detecting context.

In a scalar context, gmtime returns a human-readable string of the GMT time (defaulting to the current time, but optionally converting any Unix-epoch integer timestamp). But in a list context, a nine-element list contains the various second, minute, hour (and so on) pieces of the time for easy manipulation.

The readdir operator acts similarly to the readline operator, returning the ``next'' name from a directory in a scalar context, but all the remaining names in a list context.

And finally, a very common operation is to use the name of an array in both contexts. The ``operator'' of @x in a list context yields the current elements of the @x array. However, in a scalar context, the same ``operator'' yields the number of elements in that same array (sometimes called the ``length of the array'', but that can be confusing, so I'd rather not use that here).

Please note on that last example that at no time does Perl first extract all the elements in the scalar context, only to then somehow ``convert'' it to a count. From the very beginning, Perl knows that the @x operation is in a scalar context and performs the ``scalar'' version of that operation.

Put another way, there is no way to ``coerce'' or ``convert'' a list to a scalar, because it can never happen, in spite of what some of the so-called commercial Perl documentation incorrectly implies.

So, where does context occur? Everywhere! Let's introduce a convention for a moment, to make it easy to talk. If a portion of the expression is evaluated in scalar context, let's use SCALAR to represent that:

  $a = SCALAR;

And similarly, we'll show list context with LIST:

  @x = LIST;

So, let's look at some other common ones. Assigning to the element of an array looks like this:

  $w[SCALAR] = SCALAR;

Note that the subscripting expression is evaluated in a scalar context. That means if we had an array name on the left, and a readline operation on the right, we'll use scalar meanings for both:

  $w[@x] = <STDIN>;

And assign a single line (or undef) to the element of @w indexed by the number of elements currently in @x. As an aside, that's always evaluted before the assignment starts to happen, so:

  $w[@w] = <STDIN>;

adds the next line to the end of @w, although you'll probably scare people doing that.

Slices are in list context, even with only a single value for an index:

  @w[LIST] = LIST;
  @w[3] = LIST;

Even hash slices work that way:

  @h{LIST} = LIST;

Lists of scalars are always lists, even with only a single value (or no values) on the left:

  ($a, $b, $c) = LIST;
  ($a) = LIST;
  () = LIST;

And then we have the context provided by some common operations:

  foreach (LIST) { ... }
  if (SCALAR) { ... }
  while (SCALAR) { ... }
  @w = map { LIST } LIST;
  @x = grep { SCALAR } LIST;

One useful rule is that anything being evaluated for a true/false value is always a scalar, as shown in the if, while, and grep items above.

Subroutines act ``at a distance''. The return value of a subroutine is always evaluated in the context of the invocation of the subroutine. Here's the basic form:

  $a = &fred(LIST); sub fred { ....; return SCALAR; }
  @b = &barney(LIST); sub barney { ....; return LIST; }

But what if I had used fred for both of those? Yes, the context would pass through, and be different for different invocations! If that makes your head spin, try not to do that for a while until you fully understand it.

Speaking of subroutines: a common thing to do is to create a lexical variable (often called a my-variable) to hold incoming subroutine arguments or temporary values, as in:

  sub marine {
    my ($a) = @_;
    ...
  }

In this case, if the parentheses are included, we get list context (imagine the my is not there). The many elements of @_ get returned, but only the first of which is stored into $a (the remainder are ignored).

However, the same expression without parentheses provides scalar context to the right side:

  my $a = @_;

which gets the number of elements in $@ (the argument list). There's not one that's ``more right''; you need to learn the difference, and use the appropriate one.

And that brings us full circle to the question I posted at the beginning. What is the difference? Backquotes in a scalar context generate the entire value as one string:

  my $f = `fortune`;

but the same expression in a list context generates a list of items (one line per item, just like reading from a file), only the first of which can fit into the scalar on the left:

  my ($f) = `fortune`;

So $f gets just the first line of the fortune, harmless for those one-liners, but pretty devastating when a school official sees that a student has apparently written:

 I put the shotgun in an Adidas bag
 and padded it out with four pairs of tennis

on this schoolboy's page, in light of the tragic schoolyard shootings we seem to be hearing more about these days. Nevermind that a simple reload of the page had shown something different each time, or that this is really just a random quote.

The police were called, the boy was questioned, and now has a police file simply because he added some erroneous parentheses. No charges resulted, but the embarassment here is certainly unwelcome. (I say this from personal experience: my own ongoing saga about misplaced understandings and resulting criminal charges can be found at the archive located at http://www.lightlink.com/fors/.)

And the embarassment was also avoidable with a little more care in programming and quality-assurance testing. So when you hack perl, and you wonder about context, get the text right or you may end up a con. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 38 (Jun 2001)