Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 49 (Nov 2003)
[suggested title: ``Speeding up your Perl programs'']
How fast does your Perl run? Hopefully, most of the time, the answer is ``fast enough''. But sometimes it isn't. How do we speed up a Perl program? How do we know what's making it unacceptably slow?
The first rule of speed optimization is ``don't''. Certainly, don't write things that waste time, but first aim for clarity and maintainability. With today's processor speeds far exceeding the ``supercomputers'' of yesterday, sometimes that's ``fast enough''.
But how do we know if something is ``fast enough''? The first easy cut
is to just check total run time. That's usually as simple as putting
``time'' in front of your program invocation, and looking at the result
(which may vary depending on your operating system and choice of
shell). For example, here's a run of one of my frequently-invoked
programs, preceding the command (get.mini) with time:
[localhost:~] merlyn% time get.mini
authors/01mailrc.txt.gz ... up to date
modules/02packages.details.txt.gz ... up to date
modules/03modlist.data.gz ... up to date
67.540u 3.290s 1:40.62 70.3% 0+0k 1585+566io 0pf+0w
So, this tells me that this program took about 70 CPU seconds over a 100-second timespan, essentially doing nothing useful for me except verifying that my local mirror of the CPAN is up to date. (Generally, to get sensible numbers, I have to do this on an otherwise idle system.)
Hmm. I wonder where it's actually spending all of its time? We can
get the next level of information by invoking the built-in Perl profiler
on the script. Simply include -d:DProf as a Perl command switch,
and Perl will automatically record where the program is spending its time
into a file called tmon.out in the current directory:
[localhost:~] merlyn% perl -d:DProf `which get.mini`
authors/01mailrc.txt.gz ... up to date
modules/02packages.details.txt.gz ... up to date
modules/03modlist.data.gz ... up to date
[localhost:~] merlyn% ls -l tmon.out
-rw-r--r-- 1 merlyn staff 44991232 Sep 8 08:46 tmon.out
[localhost:~] merlyn%
(I have to use which here because perl won't look in my shell's
search path for the script.) The execution under the profiler slows
the program down a bit, but otherwise doesn't affect any
functionality. We now need to summarize the raw data of tmon.out
with the dprofpp (also included with Perl):
[localhost:~] merlyn% dprofpp
Total Elapsed Time = 99.50810 Seconds
User+System Time = 70.79810 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
14.9 10.60 12.739 545807 0.0000 0.0000 URI::_generic::authority
14.8 10.54 12.002 461833 0.0000 0.0000 URI::_generic::path
13.1 9.296 19.853 293900 0.0000 0.0001 URI::new
12.8 9.108 39.082 83971 0.0001 0.0005 URI::_generic::abs
9.82 6.953 12.677 461842 0.0000 0.0000 URI::_scheme
8.16 5.778 6.024 293900 0.0000 0.0000 URI::_init
7.58 5.368 91.054 41987 0.0001 0.0022 main::my_mirror
6.39 4.522 4.521 293900 0.0000 0.0000 URI::implementor
5.78 4.093 32.267 41984 0.0001 0.0008 URI::_generic::rel
5.07 3.588 3.588 251910 0.0000 0.0000 URI::_generic::_check_path
4.26 3.016 3.016 83976 0.0000 0.0000 File::Spec::Unix::canonpath
3.92 2.775 15.390 83968 0.0000 0.0002 URI::http::canonical
3.48 2.465 14.186 377874 0.0000 0.0000 URI::scheme
3.09 2.185 9.973 83968 0.0000 0.0001 URI::_server::canonical
2.25 1.596 3.419 41988 0.0000 0.0001 File::Spec::Unix::catdir
[localhost:~] merlyn%
Now, this is some interesting information that we can use. We took 70
CPU seconds. This is just like the previous run, so the profiling
didn't significantly alter our execution. The first routine listed,
URI::_generic::authority was called half a million times,
contributing about 10 CPU seconds to the total. (There are other
switches to dprofpp, but the default options seem to be the most
useful most of the time.)
If we could make URI::_generic::authority twice as fast, we'd save
about 5 seconds on the run. Even making URI::new run twice as fast
would save us 5 seconds on the run as well, even though it's called
half as frequently. Generally, the important thing here is to note
total CPU for a routine, not the number of times it is invoked, or the
CPU spent on an individual invocation. Although, I do find it
interesting that I had to create 293,000 URI objects: perhaps
there's some redesign of my algorithm to avoid that. Good algorithm
design is important, and is both a science and an art.
The other thing to note is that most of the routines that are burning CPU are in libraries, not in my code. So, without changing the code of a library, I need to call library routines more efficiently or less frequently if I want to speed this up.
Then we have the bigger picture to consider. I invoke this program maybe a dozen times a day. If I worked hard to reduce this program to 35 CPU seconds instead of 70 CPU seconds, I'll save about 5 CPU minutes a day. How long would I have to work to optimize it to that level? And how many times would I have to invoke the program before the money I spent on the (usually otherwise idle) CPU being saved is equal to my billing rate for the time I spent?
This is why we usually don't worry about speed. CPUs are cheap. Programmer time is expensive.
But let's push it the other way for a minute. Suppose this program had to be invoked once a minute continuously. At 70 CPU seconds per run, we've just run out of processor (not to mention that the realtime speed was even longer). So we're forced to optimize the program, or buy a bigger machine, or run the program on multiple machines (if possible).
Armed with the data from the profiling run, we might want to tackle
the URI::new routine. But rather than tweaking the code a bit, and
rerunning the program overall, it's generally more effective to
benchmark various versions of the code for given inputs, in isolation
from the rest of the program.
For example, suppose we wanted to speed up the part of the routine
that determines the URI's scheme separate from the rest of the URL.
Three possible strategies come to mind for me initially: a regular
expression match, a split, or an index/substr, implemented
roughly as follows:
sub re_match {
my $str = "http://www.stonehenge.com/perltraining/";
my ($scheme, $rest) = $str =~ /(.*?):(.*)/;
}
sub split_it {
my $str = "http://www.stonehenge.com/perltraining/";
my ($scheme, $rest) = split /:/, $str, 2;
}
sub index_substr {
my $str = "http://www.stonehenge.com/perltraining/";
my $pos = index($str, ":");
my $scheme = substr($str, 0, $pos-1);
my $rest = substr($str, $pos+1);
}
We now have three candidates. Which one is faster? For that, we
can use the built-in Benchmark module, as follows:
use Benchmark qw(cmpthese); my $URI = "http://www.stonehenge.com/perltraining/"; sub re_match { my $str = $URI; my ($scheme, $rest) = $str =~ /(.*?):(.*)/; } sub split_it { my $str = $URI; my ($scheme, $rest) = split /:/, $str, 2; } sub index_substr { my $str = $URI; my $pos = index($str, ":"); my $scheme = substr($str, 0, $pos-1); my $rest = substr($str, $pos+1); } cmpthese(-1, { re_match => \&re_match, split_it => \&split_it, index_substr => \&index_substr, });
which when run results in:
Rate re_match split_it index_substr
re_match 131022/s -- -37% -46%
split_it 208777/s 59% -- -14%
index_substr 242811/s 85% 16% --
The ``-1'' on the cmpthese call says ``run each of these for roughly 1
CPU second''. Note that index_substr seems to be the clear winner
here, even though there are more things for me to type and get
correct.
One thing to watch out for in benchmarking is optimizing for atypical data. For example, the URI I picked is probably typical, but what happens if the URL is longer instead?
my $URI = "http://www.stonehenge.com/merlyn/" . "Pictures/Trips/2003/03-06-PerlWhirlMacMania/" . "Day-0-Pearl-Harbor/?show=14";
Now our results show that index_substr is even better!
Rate re_match split_it index_substr
re_match 117108/s -- -41% -49%
split_it 199110/s 70% -- -13%
index_substr 229682/s 96% 15% --
Given those two extremes, I'd say that index_substr is probably the
right tool for this particular task, all other things being equal.
Armed with Perl's built-in profiler and the Benchmark module, I can
usually fine-tune my routines, and get the speed I want out of my Perl
programs. If that's not enough, I could take those frequently-called
expensive routines and code them in a lower-level language instead.
But usually that won't be needed. Until next time, enjoy!

