Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 12 (January 1997)

One of the neat features of Perl is the ability to provide data within the program, in various forms, rather than having to keep it all in temp files or configuration files. One of the most common ways to do this is with a ``here document'' (also called a ``here-doc''). Let's take a look at that.

First, a ``here document'' is nothing more than a long (usually multi-line) string value, but quoted in a funny way. Here's an easy example:

        $a = <<END;
        Some stuff
        goes here.
        END

This is nothing more than:

        $a = "Some stuff\ngoes here.\n";

But note that we got to write the string value across two separate lines. The key to this is the <<END token. The << says ``this is a string value, beginning at the start of the next line, and going until we find the end marker''. In this case, the end marker is literally ``END'', which must be on a line by itself later in the program. All the characters between the start of the following line, and the end marker (including the last newline before the end marker) are considered to be part of the value.

Note that the <<END is followed by a semicolon. This is nothing more than a quoted string, so to make the syntax correct, I had to still put the semicolon there.

Here's another example, showing how multiple ``here docs'' can happen on the same source line:

        print <<HELLO, <<WORLD;
        hi
        HELLO
        there
        WORLD

Note that this is really:

        print "hi\n","there\n";

although it is quoted funny.

What if you didn't want the newline? Well, it's pretty easy. One simple way for a single line is to mangle the result with substr:

        $data = substr(<<THING,0,-1);
        This is a long quoted line.
        THING

and now $data will have the line, without the newline, thanks to the substr selecting all characters but the last. Another way is with a regular expression, but I'll go a little further here:

        $stuff = join(":", <<END =~ /(.+)/g);
        hi
        there
        this\tis\ta\ttest
        of the best!
        END

There's a lot of things at once here, so let me clear it up. The value of $stuff comes from joining a list with ``:''. The list comes from the use of a regular expression m//g operator in a list context, resulting in all the matches for ``.+'' within a string. This string comes from a ``here doc'', delimited by END. The result of the regular expression match is a list that looks like:

        "hi", "there",
        "this\tis\ta\test",
        "of the best!"

which when joined will end up:

        "hi:there:this\tis\ta\test:of the best!"

Now, here's a puzzle (no fair if you know!)... are those \t's there real tab characters, or just a backslash followed by a t? Well, the ``here doc'' by default is double-quote interpolated, meaning that variables are expanded to their current value (we'll see this in a minute), and things like \t become their actual character values.

So, in this case, the resulting string has three tab characters in it. What if we didn't want that... we just wanted ``what you see is what you get'' interpretation, like a single-quoted string does? Well, I can tell the ``here doc'' to be that way by enclosing the end tag in single quotes:

        $stuff = <<'END';
        This\thas\tno\ttabs!
        END

which will end up as if I had written:

        $stuff = 'This\thas\tno\ttabs!' . "\n";

Notice that we still get the newline on the end, however. I can also specify the double-quote behavior explicitly:

        $stuff = <<"END";
        This\thas\ttabs\tnow!
        END

But that just looks like extra typing to me. It does have a use when the delimiter contains whitespace, however:

        $stuff = <<"END OF DATA";
        This is my home: $ENV{HOME}
        And this is my shell: $ENV{SHELL}
        END OF DATA

Notice that I used variables in here as well (the value of $HOME and $SHELL from the environment). It's very important to get the whitespace right. A extra leading or trailing space on any of these end markers will cause Perl to skip by that line, looking further for the real end marker.

That technique of scanning the ``here doc'' with a regular expression can be further extended. For example, suppose we need a hash loaded with keys and values:

        %data = <<END =~ /(\w+): (.*)/g;
        fred: Fred Flintstone
        barney: Barney Rubble
        betty: Betty Rubble
        wilma: Wilma Flintstone
        END

        for (sort keys %data) {
                print "$_ => $data{$_}\n";
        }

        print <<QUOTE;
        This is to inform you that $data{"fred"} and
        $data{"wilma"} are married.
        QUOTE

Here, %data is being created from pairs of elements grabbed from each regular expression match. This regular expression matches the key (like fred or barney) along with its corresponding value (like ``Fred Flintstone''). The foreach loop at the end prints out the resulting hash. And the second here-doc shows that we can access this hash as part of another here-doc.

Which leads to the next idea: using a here-doc as a ``form-letter generator''. Here's how that might look:

        for (<<'EOF' =~ /(.+)/g) {
        fredf:Fred Flintstone:$25
        barneyb:Barney Rubble:$100
        bettyb:Betty Rubble:$0.05
        EOF
                ($email, $person, $owe) = split /:/;
                print <<EOM;
        ## mail for $person
        mail -s "$person, you deadbeat!" $email <<INPUT
        Hey, $person, you owe us $owe!
        Pay up, or else!
        INPUT
        ## end of mail for $person
        EOM
        }

Sorry for the strange indentation, but here-docs have significant leading whitespace. Grr. There's a few things going on here, so let me walk through them. First, the foreach loop is walking through a list resulting from looking for all lines in the here-doc. Within the foreach loop, the line (in $_) is split on colons, resulting in the three variables $email, $person, and $owe being loaded with the three parameters.

The next step is to print out another here-doc, referencing the three variables (sometimes multiple times). This doc is a shell script that you are expected to execute. Note that the shell script also contains a shell here-doc (which is where the Perl script idea came from), delimited by ``INPUT''. Yeah, lots of levels to deal with. Run the program, and you get stuff that looks like:

        ...
        ## mail for Barney Rubble
        mail -s "Barney Rubble, you deadbeat!" barneyb <<INPUT
        Hey, Barney Rubble, you owe us $100!
        Pay up, or else!
        INPUT
        ## end of mail for Barney Rubble
        ...

Notice that this last program had some very ugly indentation. Well, with a little work, we can avoid that using a trick regular expression:

        for (1..10) {
                @data = <<END =~ /\t\t(.*)\n/g;
                        Data one $_
                        Data two $_
        END
                print "$_: @data\n";
        }

Notice that the regular expression stripped off the two tabs that I used to indent that part of the program. Cool. Unfortunately, the end marker still has to be left-justified.

And now for a final couple of tricks. The here-doc can be backquoted instead of single or double quoted. This is accomplished by putting the end marker in backquotes:

        $shell_out = <<`SHELL`;
        for i in *
        do
                echo -n \$i:
                sum \$i
        done
        SHELL
        print "shell said: $shell_out\n";

Here, the end marker ``SHELL'' is in backquotes, causing all lines from there to the end marker to be gathered up as a command for the shell. The command here is a shell loop, wandering through all of the names in the current directory, setting the shell variable $i to each of those in turn. For each file, the name is printed, followed immediately by its checksum.

Notice that we had to backslash-quote the dollar signs in this here-doc. That's because backquoted here-docs are double-quote interpolated, just like backquoted strings. However, with a little work, we can get a document that doesn't have that trouble:

        $shell_in = <<'IN';
        for i in *
        do
                echo -n $i:
                sum $i
        done
        IN
        $shell_out = `$shell_in`;
        print "shell said: $shell_out\n";

Here, I first create a string from a single-quoted here-doc (no variable interpolation, so $i is just $i). Then, I insert that string into an ordinary backquoted-string, causing the proper shell command to be launched.

As you can see, here documents make it easy to include large constant text into your program. Another way is with the use of the DATA filehandle, but we'll save that for another time. Enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 12 (January 1997)