Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 57 (mar 2005)

[suggested title: ``Understanding the Command line'']

In past columns, I've talked a lot about the Perl language, but haven't ever said much about perl at the Unix shell command line. So, let's fix that, by looking at some commonly used command-line constructs for Perl.

Let's take the simplest invocation:

  perl my-script

This invokes my-script, using the relative or absolute path to the script as given, thus not using the PATH in any way. We can include arguments to the script:

  perl my-script arg1 arg2 arg3

which sets up @ARGV to be the three individual values of arg1, arg2, and arg3, as if we had said:

  @ARGV = qw(arg1 arg2 arg3);

If we want a space within one of the values, we need to use shell quoting rules:

  perl my-script 'arg1a arg1b' arg2

This passes two arguments now, not three. We get the same result with:

  perl my-script arg1a\ arg1b arg2

using a backslash to quote the space between the arguments. If there are any shell wildcard (``glob'') characters, the shell expands them before calling our program:

  perl my-script *.html

which might turn into (given three matching files):

  @ARGV = qw(index.html problem.html results.html);

Note that Perl has no clue that a shell wildcard was involved here: it's as if we had typed the three names individually.

Perl doesn't interpret the @ARGV values in any particular way. They could be keywords, filenames, or some combination of the two. Traditionally, leading @ARGV elements that begin with a minus are considered ``options'', which we can process with modules such as Getopt::Std or Getopt::Long.

We can also have options to Perl itself by placing leading-minus values to the left of the script name. For example, we can invoke the debugger by adding -d:

  perl -d my-script arg1 arg2 arg3

Now, the program is run under the normal Perl debugger. We can pick an alternate debugger (or module using the debugging interface for other analysis) with a colon argument following the -d:

  perl -d:DProf my-script

This command selects the Devel::DProf module as the alternate ``debugger'', invoking a profiling of the Perl code.

Another common option (``switch'') is -c, which compiles a Perl script without executing it:

  perl -c my-script

You would do this to verify that the syntax of your script is good before actually moving it into place for production, including ensuring that all use'ed modules were also available. Any modules loaded at runtime (via require) or code constructed at runtime (like eval) wouldn't be checked, however. Also, all BEGIN and CHECK blocks are executed, so ``compile only'' is merely a casual definition.

You can enable warnings on the command line with -w:

  perl -w my-script

although this is more frequently handled within the program as:

  use warnings;

Sometimes, your program is small enough that it makes sense to include it entirely on the command line. Simply throw an -e switch there instead of the filename, and you're set:

  perl -e 'print "Hello world!\n"'

Note that the quoting can get a bit weird. I typically use single quotes to keep the single argument to -e together, and Perl's double quotes within the argument for Perl quoting. Sometimes, alternate quoting (via q// can come in handy):

  perl -e 'print qq/Hello world!\n/'

Multiple -e arguments are concatenated, with only a space character between:

  perl -e print -e 'qq/Hello!\n/'

By now, the number of options is a bit hard to remember. Luckily, Perl has a built-in help message, available with -h:

  perl -h

And for a few more switches that aren't about running programs, let's look at the version information with -v in short form:

  perl -v

and in long form with -V:

  perl -V

The -V switch also gives us access to the various configuration options that Perl was built with, and uses to compile binary extensions and install local programs. For example, to get the C compiler used to compile Perl:

  perl -V:cc

and to get all the options related to where binaries are found or installed:

  perl -V:'.*bin'

The regular expression pattern here is in quotes so that the shell doesn't try to expand it as a filename pattern. The output is in a form that can be evaluated by a bourne-style shell easily:

  eval `perl -V:'.*bin'`
  echo $sitebin

No attempt is made to accommodate C-shell-style shells, of course.

Modules can be included from the command line with -M:

  perl -MFile::Find -e 'find sub { print $File::Find::name, $/ }, "."'

The -MFile::Find is equivalent to including:

  use File::Find;

in the resulting script. If you don't want the imports (such as find in this case), use lowercase -m, or be specific with a trailing = syntax:

  perl -MFile::Find=find,finddepth -e '...'

which turns into:

  use File::Find qw(find finddepth);

Note the automatic comma splitting. Nice.

For text processing from a series of one or more files, we can add -n, which puts a wrapper around our program that looks like:

  LINE:
    while (<>) {
      ... # rest of your program here
    }

In other words, the @ARGV list is interpreted as a series of files to be opened, and each line is placed in $_ until all the lines are processed. To print each line with a sequential line number in front, we can use the $. variable for the numbers:

  perl -n -e 'print "$.: $_"' file1 file2 file3

We can bundle the switches that don't take arguments together with the following switch, as in:

  perl -ne 'print ...'

Another way to approach this problem is the -p, which adds a print at the end of the loop:

  LINE:
    while (<>) {
      ... # your program here
      print;
    }

So, we could just substitute the line number into the beginning of each line:

  perl -pe 's/^/$.: /' file1 file2 file3

Going one step further, we could rewrite these modified lines back into the original files with the ``inplace edit'' switch: -i:

  perl -i.bak -pe 's/^/$.: /' file1 file2 file3

Now, file1 will be renamed file1.bak, and the new updated contents written to a new file1. Similarly, file2 becomes file2.bak, and file3 becomes file3.bak.

If you leave off the option to -i, the ``inplace edit without backup file'' mode is enabled, which can save space, but give you no way to go back if you've toasted your files. Be very careful.

The line-looping modes (-n and -p) respect the current value of $/ to read a ``line'', which defaults to \n. However, you can specify alternate values with the -0 (that's a zero) switch. By default, -0 sets the delimiter to the NUL byte, which can be handy with GNU find's <-print0> switch (which delimits the filenames with NUL bytes):

  find . -name '*.html' -print0 | perl -n -0 -e unlink

Any octal value can also follow -0, indicating the corresponding ASCII character. For example, to delimit only on spaces, use -040.

If the value is -0777, then $/ is set to undef, slurping the entire file as one ``line''. Thus, we can wrap the entire file with a BEGIN/END marker as:

  perl -0777 -pi.bak -e '$_ = "BEGIN\n$_\nEND\n"' file1 file2 file3

Here, the statement is executed three times, with $_ being the entire contents of first file1, then file2 and file3.

Note that the following command mangles the lines, because the concatenate is happening after the terminating newline:

  perl -pe '$_ .= "END"' file1 file2 file3

But we can fix that with -l, which chomps each line as read, and then restores the delimiter on a print:

  perl -l -pe '$_ .= "END"' file1 file2 file3

Now the $_ contains only the line without a newline, and the concatenate happens in the right place, before the newline that gets automatically added by the implicit print at the end of the implicit loop.

Well, I hope you enjoyed this brief tour through the most common Perl command-line options. You can read more at the perlrun manpage, available either as man perlrun or perldoc perlrun at your prompt. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 57 (mar 2005)