Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 49 (Nov 2003)
[suggested title: ``Speeding up your Perl programs'']
How fast does your Perl run? Hopefully, most of the time, the answer is ``fast enough''. But sometimes it isn't. How do we speed up a Perl program? How do we know what's making it unacceptably slow?
The first rule of speed optimization is ``don't''. Certainly, don't write things that waste time, but first aim for clarity and maintainability. With today's processor speeds far exceeding the ``supercomputers'' of yesterday, sometimes that's ``fast enough''.
But how do we know if something is ``fast enough''? The first easy cut
is to just check total run time. That's usually as simple as putting
``time'' in front of your program invocation, and looking at the result
(which may vary depending on your operating system and choice of
shell). For example, here's a run of one of my frequently-invoked
programs, preceding the command (get.mini
) with time
:
[localhost:~] merlyn% time get.mini authors/01mailrc.txt.gz ... up to date modules/02packages.details.txt.gz ... up to date modules/03modlist.data.gz ... up to date 67.540u 3.290s 1:40.62 70.3% 0+0k 1585+566io 0pf+0w
So, this tells me that this program took about 70 CPU seconds over a 100-second timespan, essentially doing nothing useful for me except verifying that my local mirror of the CPAN is up to date. (Generally, to get sensible numbers, I have to do this on an otherwise idle system.)
Hmm. I wonder where it's actually spending all of its time? We can
get the next level of information by invoking the built-in Perl profiler
on the script. Simply include -d:DProf
as a Perl command switch,
and Perl will automatically record where the program is spending its time
into a file called tmon.out
in the current directory:
[localhost:~] merlyn% perl -d:DProf `which get.mini` authors/01mailrc.txt.gz ... up to date modules/02packages.details.txt.gz ... up to date modules/03modlist.data.gz ... up to date [localhost:~] merlyn% ls -l tmon.out -rw-r--r-- 1 merlyn staff 44991232 Sep 8 08:46 tmon.out [localhost:~] merlyn%
(I have to use which
here because perl
won't look in my shell's
search path for the script.) The execution under the profiler slows
the program down a bit, but otherwise doesn't affect any
functionality. We now need to summarize the raw data of tmon.out
with the dprofpp
(also included with Perl):
[localhost:~] merlyn% dprofpp Total Elapsed Time = 99.50810 Seconds User+System Time = 70.79810 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 14.9 10.60 12.739 545807 0.0000 0.0000 URI::_generic::authority 14.8 10.54 12.002 461833 0.0000 0.0000 URI::_generic::path 13.1 9.296 19.853 293900 0.0000 0.0001 URI::new 12.8 9.108 39.082 83971 0.0001 0.0005 URI::_generic::abs 9.82 6.953 12.677 461842 0.0000 0.0000 URI::_scheme 8.16 5.778 6.024 293900 0.0000 0.0000 URI::_init 7.58 5.368 91.054 41987 0.0001 0.0022 main::my_mirror 6.39 4.522 4.521 293900 0.0000 0.0000 URI::implementor 5.78 4.093 32.267 41984 0.0001 0.0008 URI::_generic::rel 5.07 3.588 3.588 251910 0.0000 0.0000 URI::_generic::_check_path 4.26 3.016 3.016 83976 0.0000 0.0000 File::Spec::Unix::canonpath 3.92 2.775 15.390 83968 0.0000 0.0002 URI::http::canonical 3.48 2.465 14.186 377874 0.0000 0.0000 URI::scheme 3.09 2.185 9.973 83968 0.0000 0.0001 URI::_server::canonical 2.25 1.596 3.419 41988 0.0000 0.0001 File::Spec::Unix::catdir [localhost:~] merlyn%
Now, this is some interesting information that we can use. We took 70
CPU seconds. This is just like the previous run, so the profiling
didn't significantly alter our execution. The first routine listed,
URI::_generic::authority
was called half a million times,
contributing about 10 CPU seconds to the total. (There are other
switches to dprofpp
, but the default options seem to be the most
useful most of the time.)
If we could make URI::_generic::authority
twice as fast, we'd save
about 5 seconds on the run. Even making URI::new
run twice as fast
would save us 5 seconds on the run as well, even though it's called
half as frequently. Generally, the important thing here is to note
total CPU for a routine, not the number of times it is invoked, or the
CPU spent on an individual invocation. Although, I do find it
interesting that I had to create 293,000 URI
objects: perhaps
there's some redesign of my algorithm to avoid that. Good algorithm
design is important, and is both a science and an art.
The other thing to note is that most of the routines that are burning CPU are in libraries, not in my code. So, without changing the code of a library, I need to call library routines more efficiently or less frequently if I want to speed this up.
Then we have the bigger picture to consider. I invoke this program maybe a dozen times a day. If I worked hard to reduce this program to 35 CPU seconds instead of 70 CPU seconds, I'll save about 5 CPU minutes a day. How long would I have to work to optimize it to that level? And how many times would I have to invoke the program before the money I spent on the (usually otherwise idle) CPU being saved is equal to my billing rate for the time I spent?
This is why we usually don't worry about speed. CPUs are cheap. Programmer time is expensive.
But let's push it the other way for a minute. Suppose this program had to be invoked once a minute continuously. At 70 CPU seconds per run, we've just run out of processor (not to mention that the realtime speed was even longer). So we're forced to optimize the program, or buy a bigger machine, or run the program on multiple machines (if possible).
Armed with the data from the profiling run, we might want to tackle
the URI::new
routine. But rather than tweaking the code a bit, and
rerunning the program overall, it's generally more effective to
benchmark various versions of the code for given inputs, in isolation
from the rest of the program.
For example, suppose we wanted to speed up the part of the routine
that determines the URI's scheme separate from the rest of the URL.
Three possible strategies come to mind for me initially: a regular
expression match, a split
, or an index
/substr
, implemented
roughly as follows:
sub re_match { my $str = "http://www.stonehenge.com/perltraining/"; my ($scheme, $rest) = $str =~ /(.*?):(.*)/; } sub split_it { my $str = "http://www.stonehenge.com/perltraining/"; my ($scheme, $rest) = split /:/, $str, 2; } sub index_substr { my $str = "http://www.stonehenge.com/perltraining/"; my $pos = index($str, ":"); my $scheme = substr($str, 0, $pos-1); my $rest = substr($str, $pos+1); }
We now have three candidates. Which one is faster? For that, we
can use the built-in Benchmark
module, as follows:
use Benchmark qw(cmpthese); my $URI = "http://www.stonehenge.com/perltraining/"; sub re_match { my $str = $URI; my ($scheme, $rest) = $str =~ /(.*?):(.*)/; } sub split_it { my $str = $URI; my ($scheme, $rest) = split /:/, $str, 2; } sub index_substr { my $str = $URI; my $pos = index($str, ":"); my $scheme = substr($str, 0, $pos-1); my $rest = substr($str, $pos+1); } cmpthese(-1, { re_match => \&re_match, split_it => \&split_it, index_substr => \&index_substr, });
which when run results in:
Rate re_match split_it index_substr re_match 131022/s -- -37% -46% split_it 208777/s 59% -- -14% index_substr 242811/s 85% 16% --
The ``-1'' on the cmpthese
call says ``run each of these for roughly 1
CPU second''. Note that index_substr
seems to be the clear winner
here, even though there are more things for me to type and get
correct.
One thing to watch out for in benchmarking is optimizing for atypical data. For example, the URI I picked is probably typical, but what happens if the URL is longer instead?
my $URI = "http://www.stonehenge.com/merlyn/" . "Pictures/Trips/2003/03-06-PerlWhirlMacMania/" . "Day-0-Pearl-Harbor/?show=14";
Now our results show that index_substr
is even better!
Rate re_match split_it index_substr re_match 117108/s -- -41% -49% split_it 199110/s 70% -- -13% index_substr 229682/s 96% 15% --
Given those two extremes, I'd say that index_substr
is probably the
right tool for this particular task, all other things being equal.
Armed with Perl's built-in profiler and the Benchmark
module, I can
usually fine-tune my routines, and get the speed I want out of my Perl
programs. If that's not enough, I could take those frequently-called
expensive routines and code them in a lower-level language instead.
But usually that won't be needed. Until next time, enjoy!