Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 38 (Jun 2001)
[suggested title: ``It's all about context'']
Just recently, an article on the Slashdot (www.slashdot.org
)
discussed the surprising story of a high-school student who had made a
small Perl programming mistake that got him into a big amount of
trouble. On his dynamically generated webpage, he had used the code:
my($f) = `fortune`;
when what he should have done was:
my $f = `fortune`;
Now, both of these invoke the fortune program, capturing its random quip of text. In this particular case, when the school administrators had visited the boy's page, fortune had selected a quote from a William Gibson novel:
I put the shotgun in an Adidas bag and padded it out with four pairs of tennis socks, not my style at all, but that was what I was aiming for: If they think you're crude, go technical; if they think you're technical, go crude. I'm a very technical boy. So I decided to get as crude as possible. These days, though, you have to be pretty technical before you can even aspire to crudeness. - Johnny Mnemonic, by William Gibson
Now, if you can't tell what would be in $f
for both of the code
fragments above, read on, and you'll see how an unwitting mistake can
leave someone with an unexpected police file.
The problem is a matter of context (in more ways than one). Perl's operators are ``context sensitive'', in that the operator can detect whether it is being used in a place looking for a scalar rather than looking for a list, and return an appropriate result. In this case, the backtick operator returns a differing result, depending on whether it was invoked in a scalar context or a list context.
To understand this, let's first look at how to detect context. Starting with the basics, the right-hand side of an assignment to a scalar variable must be a scalar value:
$a = ...
Whatever's over there on the right, it's got to be a scalar value, because that's the only thing that'll fit into a scalar variable.
Similarly, the right side of an assigment to an array can be any list value:
@b = ...
Let's put some things in both places and see how it differs. One that you are almost certainly familar with is the readline operator, spelled ``less-than filehandle greater-than'':
$a = <STDIN>
In this ``scalar context'', the readline operator returns the next line to be read, or undef if an I/O error occurs (such as at end-of-file).
However, the very same operator and punctuation in a ``list context'' yields all the remaining lines until end-of-file is reached, or an empty list if already at end of file:
@b = <STDIN>
Now, Larry Wall could have come up with two different operators for these two similar operations, but by making the operator ``context sensitive'', we get the savings of brainspace and keyboardspace. Apparently, we humans are fairly good at grokking context, so why not leverage off that a bit in the language?
Similarly, the matching operator in a scalar context returns a success value:
$a = /(\w+) (\d+)/
which is true if the regular expression matches $_
and false
otherwise. If the result was true, we'd look in $1
for the word,
and $2
for the digit string. A shorter way to do the same thing though
is to use the same regular expression in a list context:
@b = /(\w+) (\d+)/
And now, the regular expression match operator is not returning
true/false, but rather a list of two items (the two memories) or an
empty list if the match was not successful. So $b[0]
ends up with
the word, and $b[1]
gets the digit-string.
In both of these operators, the scalar interpretation and the list interpretation are related, but not by any predictable formula. That's the way it is in general. You can't apply a general rule, except that there are no rules. It's whatever Larry thought would be the most practical and useful, and least surprising (well, least surprising to Larry).
A few more examples to keep getting our feet progressively more wet, and then we'll look some more at detecting context.
In a scalar context, gmtime
returns a human-readable string of the
GMT time (defaulting to the current time, but optionally converting
any Unix-epoch integer timestamp). But in a list context, a
nine-element list contains the various second, minute, hour (and so
on) pieces of the time for easy manipulation.
The readdir
operator acts similarly to the readline
operator,
returning the ``next'' name from a directory in a scalar context, but
all the remaining names in a list context.
And finally, a very common operation is to use the name of an array in
both contexts. The ``operator'' of @x
in a list context yields the
current elements of the @x
array. However, in a scalar context,
the same ``operator'' yields the number of elements in that same
array (sometimes called the ``length of the array'', but that can be
confusing, so I'd rather not use that here).
Please note on that last example that at no time does Perl first
extract all the elements in the scalar context, only to then somehow
``convert'' it to a count. From the very beginning, Perl knows that the
@x
operation is in a scalar context and performs the ``scalar''
version of that operation.
Put another way, there is no way to ``coerce'' or ``convert'' a list to a scalar, because it can never happen, in spite of what some of the so-called commercial Perl documentation incorrectly implies.
So, where does context occur? Everywhere! Let's introduce a
convention for a moment, to make it easy to talk. If a portion of the
expression is evaluated in scalar context, let's use SCALAR
to
represent that:
$a = SCALAR;
And similarly, we'll show list context with LIST
:
@x = LIST;
So, let's look at some other common ones. Assigning to the element of an array looks like this:
$w[SCALAR] = SCALAR;
Note that the subscripting expression is evaluated in a scalar context. That means if we had an array name on the left, and a readline operation on the right, we'll use scalar meanings for both:
$w[@x] = <STDIN>;
And assign a single line (or undef) to the element of @w
indexed by
the number of elements currently in @x
. As an aside, that's
always evaluted before the assignment starts to happen, so:
$w[@w] = <STDIN>;
adds the next line to the end of @w
, although you'll probably
scare people doing that.
Slices are in list context, even with only a single value for an index:
@w[LIST] = LIST; @w[3] = LIST;
Even hash slices work that way:
@h{LIST} = LIST;
Lists of scalars are always lists, even with only a single value (or no values) on the left:
($a, $b, $c) = LIST; ($a) = LIST; () = LIST;
And then we have the context provided by some common operations:
foreach (LIST) { ... } if (SCALAR) { ... } while (SCALAR) { ... } @w = map { LIST } LIST; @x = grep { SCALAR } LIST;
One useful rule is that anything being evaluated for a true/false
value is always a scalar, as shown in the if
, while
, and grep
items above.
Subroutines act ``at a distance''. The return value of a subroutine is always evaluated in the context of the invocation of the subroutine. Here's the basic form:
$a = &fred(LIST); sub fred { ....; return SCALAR; } @b = &barney(LIST); sub barney { ....; return LIST; }
But what if I had used fred
for both of those? Yes, the context
would pass through, and be different for different invocations! If
that makes your head spin, try not to do that for a while until you
fully understand it.
Speaking of subroutines: a common thing to do is to create a lexical
variable (often called a my
-variable) to hold incoming subroutine
arguments or temporary values, as in:
sub marine { my ($a) = @_; ... }
In this case, if the parentheses are included, we get list context
(imagine the my
is not there). The many elements of @_
get
returned, but only the first of which is stored into $a
(the
remainder are ignored).
However, the same expression without parentheses provides scalar context to the right side:
my $a = @_;
which gets the number of elements in $@
(the argument list).
There's not one that's ``more right''; you need to learn the difference,
and use the appropriate one.
And that brings us full circle to the question I posted at the beginning. What is the difference? Backquotes in a scalar context generate the entire value as one string:
my $f = `fortune`;
but the same expression in a list context generates a list of items (one line per item, just like reading from a file), only the first of which can fit into the scalar on the left:
my ($f) = `fortune`;
So $f
gets just the first line of the fortune, harmless for those
one-liners, but pretty devastating when a school official sees that a
student has apparently written:
I put the shotgun in an Adidas bag and padded it out with four pairs of tennis
on this schoolboy's page, in light of the tragic schoolyard shootings we seem to be hearing more about these days. Nevermind that a simple reload of the page had shown something different each time, or that this is really just a random quote.
The police were called, the boy was questioned, and now has a police
file simply because he added some erroneous parentheses. No charges
resulted, but the embarassment here is certainly unwelcome. (I say
this from personal experience: my own ongoing saga about misplaced
understandings and resulting criminal charges can be found at the
archive located at http://www.lightlink.com/fors/
.)
And the embarassment was also avoidable with a little more care in programming and quality-assurance testing. So when you hack perl, and you wonder about context, get the text right or you may end up a con. Until next time, enjoy!