Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 69 (Mar 2007)

[Suggested title: ``The Replacements'']

If you've used Perl longer than 15 minutes, you've no doubt seen (and probably typed) the extremely useful substitute operation, typically appearing as s/old/new/. Let's look at some of the things you may already know, and hopefully a few things that you don't know yet about this very common operation.

The most important thing to notice is that the substitute operation is that it acts by default on our friend, the $_ variable:

  $_ = "hello";
  s/ell/ipp/; # $_ is now "hippo"

The ``left side'' of a substitute is a regular expression, so all of the rules about regular expressions apply:

  $_ = "hello";
  s/e.*l/ipp/; # $_ is now "hippo"

Here, the .* portion looks for any number of (nearly) any character, and the longest selection that matches and still permits the rest of the expression to match. In this case, it was the single l character between the e and the second l. Had we instead opted for the lazy version of .*?, we'd get the closest l instead:

  $_ = "hello";
  s/e.*?l/ipp/; # $_ is now "hipplo"

Like a regular expression match, we can steer the substitution away from $_ and toward some other location, using the =~ construct. Unlike the match operation though, we have to specify an lvalue (such as a variable name), not an rvalue (result of an expression):

  my $text = "hello";
  $text =~ s/ell/ipp/; # $text is now "hippo"

I occasionally find that some people are confused about the return value of a substitution. After all, if $text now has a new value, isn't that also what I'll see if I put this replacement in a larger context?

  my $text = "hello";
  my $result = ($text =~ s/ell/ipp/);

And the answer is no. Although the substitution is indeed altering $text here, what it returns is a true/false value of whether or not the substitution has happened. In this case, $result is true. This property of returning the success is handy when we're performing conditional operations:

  if (s/foo/bar/) { # if foo was found, it's now bar, and...
    ... we do the code here ...
  } else {
    ... we didn't find foo, and $_ is unchanged ...
  }

The replacement is performed in the first possible place:

  $_ = "hello";
  s/l/p/; # $_ is now "heplo";
  s/l/p/; # $_ is now "heppo";

To repeat the substitution on all non-overlapping matches, we add a g suffix:

  $_ = "hillo";
  s/l/p/g; # $_ is now "hippo";

The important word there is non-overlapping. Perl looks for each new match after the end of the previous match. So, the result of a substitution like this may at first be surprising:

  $_ = "aaa,bbb,ccc,ddd,eee,fff,ggg";
  s/,.*?,/,XXX,/g; # replace all fields with XXX (no!)

When we check the result, we see:

  aaa,XXX,ccc,XXX,eee,XXX,ggg

Oops! Why did it do every other entry? On the first match, we matched ,bbb,, and replaced that with ,XXX,. Good so far. But we can't now look at the comma there as the beginning of ,ccc,, because these have to be non-overlapping!

We can fix that by making the trailing comma merely a lookahead:

  $_ = "aaa,bbb,ccc,ddd,eee,fff,ggg";
  s/,.*?(?=,)/,XXX/g; # replace all fields with XXX (almost...)

Now, the trailing comma is not considered part of the match, so it's not ripped out, and it's not skipped past to find the next match. Note that I also had to change the replacement string so it doesn't add a comma back in. Now we're getting closer:

  aaa,XXX,XXX,XXX,XXX,XXX,ggg

Hmm. We're still missing the beginning. That's understandable, because we're requiring a comma before the letters. And we're also missing the end, because we demand a trailing comma, even though we're not considering it part of the match. We can fix both of those problems with a bit more work:

  $_ = "aaa,bbb,ccc,ddd,eee,fff,ggg";
  s/(^|(?<=,)).*?((?=,)|$)/XXX/g; # replace all fields with XXX

OK, this is starting to look ugly. Like a regex match, we can pull that apart with a trailing x:

  s/
    (
      ^         # either beginning of line
      |         # or
      (?<=,)    # a single comma to the left
    )
    .*?         # as few characters as possible
    (
      (?=,)     # a single comma to the right
      |         # or
      $         # end of string
    )
  /XXX/gx;

That's much easier to read (relatively speaking).

Like a regular expression match, we can use an alternate delimiter for the left and right sides of the subtitution:

  $_ = "hello";
  s%ell%ipp%; # $_ is now "hippo"

The rules are a bit complicated, but it works precisely the way Larry Wall wanted it to work. If the delimiter chosen is not one of the special characters that begins a pair, then we use the character twice more to both separate the pattern from the replacement, and finally to terminate the replacement, as the example above showed.

However, if we use the beginning character of a paired character set (parentheses, curly braces, square brackets, or even less-than and greater-than), we close off the pattern with the corresponding closing character. Then, we get to pick another delimeter all over again, using the same rules. For example, these all do the same thing:

  s/ell/ipp/;
  s%ell%ipp%;
  s;ell;ipp;; # don't do this!
  s#ell#ipp#; # one of my favorites
  s[ell]#ipp#; [] for pattern, # for replacement
  s[ell][ipp]; [] for both pattern and replacement
  s<ell><ipp>; <> for both pattern and replacement
  s{ell}(ipp); {} for pattern, () for replacement

No matter what the closing delimiter might be for either the pattern or the replacement, we can include the character literally by preceding it with a backslash:

  $_ = "hello";
  s/ell/i\/n/; # $_ is now "hi/no";
  s/\/no/res/; # $_ is now "hires";

To avoid backslashing, pick a distinct delimeter:

  $_ = "hello";
  s%ell%i/n%; # $_ is now "hi/no";
  s%/no%res%; # $_ is now "hires";

Conveniently, if a paired character is used, the pairs may be nested without invoking any backslashes:

  $_ = "aaa,bbb,ccc,ddd,eee,fff,ggg";
  s((^|(?<=,)).*?((?=,)|$))(XXX)g; # replace all fields with XXX

Note that even though the pattern contains closing parentheses, they are all paired with opening parentheses, so the pattern ends at the right place.

The right side of the substitution operation is generally treated as if it were a double-quoted string: variable interpolation and backslash interpretation is performed directly:

  $replacement = "ipp";
  $_ = "hello";
  s/ell/$replacement/; # $_ is now "hippo"

The left side of a substitution is also treated as if it were a double-quoted string (with a few exceptions), and this interpolation happens before the result is evaluated as a regular expression:

  $pattern = "ell";
  $replacement = "ipp";
  $_ = "hello";
  s/$pattern/$replacement/; # $_ is now "hippo"

Using this form of pattern, Perl is forced to compile the regular expression at runtime. If this happens in a loop, Perl may need to recompile the regular expression repeatedly, causing a slowdown. We can give Perl a hint that the pattern is really a regular expression by using a regular expression literal:

  $pattern = qr/ell/;
  $replacement = "ipp";
  $_ = "hello";
  s/$pattern/$replacement/; # $_ is now "hippo"

The qr operation creates a Regexp object, which interpolates into the pattern with minimal fuss and maximal speed.

I hope you've enjoyed this brief overview of the replacement operation, although it's no replacement (ugh) for the manpages, such as perlre, perlretut, perlrequick, and perlreref. Check those out for more details, and until next time, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.