Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 20 (June 1998)
With many ways to manipulate strings, Perl is a good ``text wrangling'' language. Perl makes it easy to read in strings of arbitrary length, select and extract interesting data, and write the results to files, sockets, or other processes.
One interesting problem is the ability to have text contain arbitrary expressions. This is really handy when you have a template file (say, a report or an HTML page) that stays mostly constant, but should have some variable or freshly computed parts. Perl normally doesn't recognize such expressions within a string as anything other than just some additional characters to print, but there are circumstances where the text is changed.
For example,
$a = 3 + 4; print "I have $a eggs\n";
allows me to compute the expression of 3 + 4, and then insert the result into the string. However, putting the same expression into the string doesn't work:
print "I have 3 + 4 eggs\n";
because Perl cannot tell whether this is the text of 3 + 4, or an expression to be calculated. A while back, I came up with a trick to get an expression evaluated inside a double-quoted string, and it became the easiest way to handle the problem of getting expressions within strings. It looks a little ugly, but so does the rest of Perl, so by comparison, it's not half bad.
The trick is to simply precede the expression with @{[
and follow it with
]}
, like so:
print "I have @{[ 3 + 4 ]} eggs\n";
If you execute this code, you'll see that it correctly prints 7 eggs!
How is this working? Well, the outer @{ ... }
triggers an array
interpolation, requiring either an array name or a list reference
inside the braces. The inner square brackets create an anonymous
list, and return the reference to that list. This anonymous list has
to be computed from the list of expressions within the brackets -- in
this case, there's only one, so it's a single element list.
Thus, the expression is computed, turned into an anonymous list, then
interpolated by the @
trigger, and we're done!
We can even make use of this construct in larger documents (using here-strings):
open SM, "|/usr/lib/sendmail -t"; print SM <<END; To: $destination From: @{[$source || "root"]} Subject: update at @{[scalar localtime]}
The following people are logged on to @{[`hostname` =~ /(.*)/]}:
@{ my %foo = map /^(\S+)()/, `who`; [sort keys %foo]; } END close SM;
There's a lot of meat here... let's go through it a step at a time. First, I'm opening a pipe to sendmail, to send a mail message. Next, I'm printing a double-quoted here-string to that pipe. The $destination variable is an ordinary scalar variable that I've set somewhere before this code.
The from line of the message uses the construct described above. If
the $source variable is set, it's used -- otherwise, the constant
root
is returned.
The subject line of the message also uses the construct described above. The localtime operator in a scalar context returns a nice timestamp. Because the square-bracket anonymous list constructor wants to evaluate the elements in a list context, I have to force scalar context with the scalar operator. The resulting expression is squished into the subject line with relative ease.
Similarly, the current hostname is computed and inserted. Note that I'm taking the output of the backquoted hostname command, and matching it against a regular expression that extracts all the characters before the newline. That way, the newline is not extracted, and I can use it as text in the middle of the line.
The final chunk of code within this string uses an extra trick. The
@{...}
construct is really any block of code, as long as the last
expression evaluated in that block is a listref of some kind. So, to
get a unique list of users on the system, I can use the keys of a
temporary hash as a set. The output of the who command is broken
into lines, and matched line by line with the regular expression,
generating two elements of a total list for each original line. This
is the right shape of a result to create the hash. Finally, the
keys of the hash are sorted, and turned into an anonymous list.
Another way of having a ``mostly constant, but sometimes changing'' text string is to perform a global substitute on the string. While we can't get arbitrary expressions, it works well when the data comes from a data structure, like a hash:
%data = ( TO => 'fred@random.place', PRIZE => 'a pearl necklace', VALUE => '1000', ); $_ = <<'EOF'; To: %TO% From: merlyn@stonehenge.com Subject: Your lucky day
You are the winner of %PRIZE%, worth over $%VALUE%! Congratulations. EOF s/%(\w+)%/$data{$1}/g; print;
For each of the words found between percent signs, the corresponding
hash element is looked up by key, and replaced with its value. This
is good for those form-letter type problems. If the data cannot be
stored in a hash like this, we could go a step further and make the
replacement text a full expression, instead of a simple double-quoted
string, using the /e
modifier on the substitution.
$_ = <<'EOF'; To: %TO% From: merlyn@stonehenge.com Subject: Your lucky day
You are the winner of %PRIZE%, worth over $%VALUE%! Congratulations. EOF s/%(\w+)%/&getvaluefor($1)/eg; print; sub getvaluefor { my $key = shift; ... }
Here, the subroutine &getvaluefor
will be called repeatedly, once
for each keyword found in the text. Whatever string is returned by
the subroutine will be value inserted into the final text. The
subroutine can thus be arbitrarily complex, including having default
values or cached computations.
But we're still a long ways away from what I did earlier -- having the code to execute within the template. It's really not that far away however, if we use the ``double evaluation'' mode of the subsitution operator. Let's look at this example:
$_ = 'I have [ 3 + 4 ] eggs'; s/\[(.*?)\]/$1/eegs; print;
This prints I have 7 eggs
, but how? Well, eliminating what we know
so far... the /s
means that .
can match a newline. And /g
means
that we are doing more than one substitution. And a single /e
means
that the right side is a Perl expression, not a double-quoted string.
And in fact, we have $1
there, so that's good so far.
But the presence of the second /e
means that the value of the
expression on the right side should again be considered to be Perl
code, and then evaluated for its string value! (This was initially
considered to be a bug, but when it was noticed to be useful, retained
as a feature.)
So it goes from $1
to " 3 + 4 "
to 7
, and the 7 gets inserted
in place of the bracketed expression. We can have anything we want
between the brackets, and it'll be evaluated as Perl code.
So, there you have it... many ways of having ``mostly constant, some
variable'' text in your program. Let me conclude with a piece of
history here. For many years, I used to end my postings in
comp.lang.perl.misc
with some clever (often obscure) chunk of code
that would print out ``Just another Perl hacker,''. When I discovered
the ``double eval'' trick for substution, I just had to use it in one
of these ``JAPH'' postings. And here's the result:
$Old_MacDonald = q#print #; $had_a_farm = (q-q:Just another Perl hacker,:-); s/^/q[Sing it, boys and girls...],$Old_MacDonald.$had_a_farm/eieio;
See if you can figure out how it works!
Special thanks to fellow Perl lead developer and trainer, Chip Salzenberg, for the idea for this month's column. Thanks Chip!