Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 49 (Nov 2003)

[suggested title: ``Speeding up your Perl programs'']

How fast does your Perl run? Hopefully, most of the time, the answer is ``fast enough''. But sometimes it isn't. How do we speed up a Perl program? How do we know what's making it unacceptably slow?

The first rule of speed optimization is ``don't''. Certainly, don't write things that waste time, but first aim for clarity and maintainability. With today's processor speeds far exceeding the ``supercomputers'' of yesterday, sometimes that's ``fast enough''.

But how do we know if something is ``fast enough''? The first easy cut is to just check total run time. That's usually as simple as putting ``time'' in front of your program invocation, and looking at the result (which may vary depending on your operating system and choice of shell). For example, here's a run of one of my frequently-invoked programs, preceding the command (get.mini) with time:

    [localhost:~] merlyn% time get.mini
    authors/01mailrc.txt.gz ... up to date
    modules/02packages.details.txt.gz ... up to date
    modules/03modlist.data.gz ... up to date
    67.540u 3.290s 1:40.62 70.3%        0+0k 1585+566io 0pf+0w

So, this tells me that this program took about 70 CPU seconds over a 100-second timespan, essentially doing nothing useful for me except verifying that my local mirror of the CPAN is up to date. (Generally, to get sensible numbers, I have to do this on an otherwise idle system.)

Hmm. I wonder where it's actually spending all of its time? We can get the next level of information by invoking the built-in Perl profiler on the script. Simply include -d:DProf as a Perl command switch, and Perl will automatically record where the program is spending its time into a file called tmon.out in the current directory:

    [localhost:~] merlyn% perl -d:DProf `which get.mini`
    authors/01mailrc.txt.gz ... up to date
    modules/02packages.details.txt.gz ... up to date
    modules/03modlist.data.gz ... up to date
    [localhost:~] merlyn% ls -l tmon.out
    -rw-r--r--    1 merlyn   staff    44991232 Sep  8 08:46 tmon.out
    [localhost:~] merlyn%

(I have to use which here because perl won't look in my shell's search path for the script.) The execution under the profiler slows the program down a bit, but otherwise doesn't affect any functionality. We now need to summarize the raw data of tmon.out with the dprofpp (also included with Perl):

    [localhost:~] merlyn% dprofpp
    Total Elapsed Time = 99.50810 Seconds
      User+System Time = 70.79810 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     14.9   10.60 12.739 545807   0.0000 0.0000  URI::_generic::authority
     14.8   10.54 12.002 461833   0.0000 0.0000  URI::_generic::path
     13.1   9.296 19.853 293900   0.0000 0.0001  URI::new
     12.8   9.108 39.082  83971   0.0001 0.0005  URI::_generic::abs
     9.82   6.953 12.677 461842   0.0000 0.0000  URI::_scheme
     8.16   5.778  6.024 293900   0.0000 0.0000  URI::_init
     7.58   5.368 91.054  41987   0.0001 0.0022  main::my_mirror
     6.39   4.522  4.521 293900   0.0000 0.0000  URI::implementor
     5.78   4.093 32.267  41984   0.0001 0.0008  URI::_generic::rel
     5.07   3.588  3.588 251910   0.0000 0.0000  URI::_generic::_check_path
     4.26   3.016  3.016  83976   0.0000 0.0000  File::Spec::Unix::canonpath
     3.92   2.775 15.390  83968   0.0000 0.0002  URI::http::canonical
     3.48   2.465 14.186 377874   0.0000 0.0000  URI::scheme
     3.09   2.185  9.973  83968   0.0000 0.0001  URI::_server::canonical
     2.25   1.596  3.419  41988   0.0000 0.0001  File::Spec::Unix::catdir
    [localhost:~] merlyn%

Now, this is some interesting information that we can use. We took 70 CPU seconds. This is just like the previous run, so the profiling didn't significantly alter our execution. The first routine listed, URI::_generic::authority was called half a million times, contributing about 10 CPU seconds to the total. (There are other switches to dprofpp, but the default options seem to be the most useful most of the time.)

If we could make URI::_generic::authority twice as fast, we'd save about 5 seconds on the run. Even making URI::new run twice as fast would save us 5 seconds on the run as well, even though it's called half as frequently. Generally, the important thing here is to note total CPU for a routine, not the number of times it is invoked, or the CPU spent on an individual invocation. Although, I do find it interesting that I had to create 293,000 URI objects: perhaps there's some redesign of my algorithm to avoid that. Good algorithm design is important, and is both a science and an art.

The other thing to note is that most of the routines that are burning CPU are in libraries, not in my code. So, without changing the code of a library, I need to call library routines more efficiently or less frequently if I want to speed this up.

Then we have the bigger picture to consider. I invoke this program maybe a dozen times a day. If I worked hard to reduce this program to 35 CPU seconds instead of 70 CPU seconds, I'll save about 5 CPU minutes a day. How long would I have to work to optimize it to that level? And how many times would I have to invoke the program before the money I spent on the (usually otherwise idle) CPU being saved is equal to my billing rate for the time I spent?

This is why we usually don't worry about speed. CPUs are cheap. Programmer time is expensive.

But let's push it the other way for a minute. Suppose this program had to be invoked once a minute continuously. At 70 CPU seconds per run, we've just run out of processor (not to mention that the realtime speed was even longer). So we're forced to optimize the program, or buy a bigger machine, or run the program on multiple machines (if possible).

Armed with the data from the profiling run, we might want to tackle the URI::new routine. But rather than tweaking the code a bit, and rerunning the program overall, it's generally more effective to benchmark various versions of the code for given inputs, in isolation from the rest of the program.

For example, suppose we wanted to speed up the part of the routine that determines the URI's scheme separate from the rest of the URL. Three possible strategies come to mind for me initially: a regular expression match, a split, or an index/substr, implemented roughly as follows:

  sub re_match {
    my $str = "http://www.stonehenge.com/perltraining/";;
    my ($scheme, $rest) = $str =~ /(.*?):(.*)/;
  }
  sub split_it {
    my $str = "http://www.stonehenge.com/perltraining/";;
    my ($scheme, $rest) = split /:/, $str, 2;
  }
  sub index_substr {
    my $str = "http://www.stonehenge.com/perltraining/";;
    my $pos = index($str, ":");
    my $scheme = substr($str, 0, $pos-1);
    my $rest = substr($str, $pos+1);
  }

We now have three candidates. Which one is faster? For that, we can use the built-in Benchmark module, as follows:

  use Benchmark qw(cmpthese);
  my $URI = "http://www.stonehenge.com/perltraining/";;
  sub re_match {
    my $str = $URI;
    my ($scheme, $rest) = $str =~ /(.*?):(.*)/;
  }
  sub split_it {
    my $str = $URI;
    my ($scheme, $rest) = split /:/, $str, 2;
  }
  sub index_substr {
    my $str = $URI;
    my $pos = index($str, ":");
    my $scheme = substr($str, 0, $pos-1);
    my $rest = substr($str, $pos+1);
  }
  cmpthese(-1, {
    re_match => \&re_match,
    split_it => \&split_it,
    index_substr => \&index_substr,
  });

which when run results in:

                     Rate     re_match     split_it index_substr
    re_match     131022/s           --         -37%         -46%
    split_it     208777/s          59%           --         -14%
    index_substr 242811/s          85%          16%           --

The ``-1'' on the cmpthese call says ``run each of these for roughly 1 CPU second''. Note that index_substr seems to be the clear winner here, even though there are more things for me to type and get correct.

One thing to watch out for in benchmarking is optimizing for atypical data. For example, the URI I picked is probably typical, but what happens if the URL is longer instead?

  my $URI = "http://www.stonehenge.com/merlyn/"; .
    "Pictures/Trips/2003/03-06-PerlWhirlMacMania/" .
    "Day-0-Pearl-Harbor/?show=14";

Now our results show that index_substr is even better!

                     Rate     re_match     split_it index_substr
    re_match     117108/s           --         -41%         -49%
    split_it     199110/s          70%           --         -13%
    index_substr 229682/s          96%          15%           --

Given those two extremes, I'd say that index_substr is probably the right tool for this particular task, all other things being equal.

Armed with Perl's built-in profiler and the Benchmark module, I can usually fine-tune my routines, and get the speed I want out of my Perl programs. If that's not enough, I could take those frequently-called expensive routines and code them in a lower-level language instead. But usually that won't be needed. Until next time, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.