Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 26 (Jun 1999)
There's no doubt that Perl is excellent at gluing together outside-world programs and data, but Perl is also pretty good at pure programming problems that you might find in a beginning computer science course. For example, let's take a look at that common task in your typical first year computer course, finding prime numbers.
As I recall from my early math years, a prime number is any integer greater than 1 that is divisible by only 1 and itself, and no other integers. Warning: I'm not a math geek, so if that's not precisely it, I'm sure someone will let me know.
So, let's generate some prime numbers using a very direct translation of this specification:
GUESS: for (my $guess = 2; $guess <= 10000; $guess++) { for (my $divisor = 2; $divisor < $guess; $divisor++) { next GUESS unless $guess % $divisor; } print "$guess\n"; }
The outer loop is named GUESS
, so that we can skip forward with
respect to this loop even from within the inner loop. The variable
$guess
is my try at each prime number. I'm establishing a lexical
local variable for this loop, initializing it to 2, and incrementing
it by one until it hits 10,000.
The inner loop tries all the possible divisors from 2 through the
number we're checking in $guess
. The expression $guess %
$divisor
is true when there's a remainder, meaning that $guess
is
not divisible by $divisor
.
On my system, this program takes about 13 CPU seconds just to get all the primes up to 10000. Obviously, to do 100,000 numbers this way would be prohibitively expensive.
But, let's apply a little math to the selection of numbers to speed things up a bit. For one thing, it's silly to try any divisor that's bigger than the square root of the guess. (The higher divisor would have needed another factor that was smaller than the square root in order for their product to be less than the guess.) So, let's change that inner loop a bit:
GUESS: for (my $guess = 2; $guess <= 10000; $guess++) { for (my $divisor = 2; $divisor * $divisor <= $guess; $divisor++) { next GUESS unless $guess % $divisor; } print "$guess\n"; }
Here, notice that the ending condition of the inner loop has been
changed, so that we'll drop out of the $divisor
checking when a
divisor is more than the square root of the potential prime in
$guess
.
And yes, that now gets it down to under a half CPU second on my system. If I bump the upper limit to 100,000, we're back up to 8 CPU seconds again, though. Let's try another optimization. After the prime number '2', there are no further prime numbers that are even, and there is also no reason to try any divisors that are even. (As my mathbooks often say, ``proof of this is left to the reader''.) So, let's change the loops a bit:
print "2\n"; GUESS: for (my $guess = 3; $guess <= 100_000; $guess += 2) { for (my $divisor = 3; $divisor * $divisor <= $guess; $divisor += 2) { next GUESS unless $guess % $divisor; } print "$guess\n"; }
Here, I'll have to print 2
explicitly, because it would make the
algorithm rather messy to try to include it inside. Then we'll try
every odd number starting with 3 as potential primes. Similarly,
the inner loop tries all odd numbers starting with 3 as potential
divisors, but still stopping when the square of the divisor exceeds
the candidate prime.
And, as I suspected, that cut the time down in about half, to under 5 CPU seconds. That's still not gonna scale well for getting primes under say, a million, or 10 million. To do that, we need to do something other than brute force.
The next major hack is to introduce the classic ``Sieve of Eratosthenes'' for finding primes. To do this, we first make an array that is all zeros up to the highest prime we'll be finding. Then, starting at the number 2, we set every multiple of 2 to a non-zero value, indicating that there's no way that the number is prime. Sliding forward, we do the same for the next item, 3, setting all multiples of 3 to non-zero. That looks like this:
my $UPPER = 100_000; my @sieve = (0) x ($UPPER + 1); GUESS: for (my $guess = 2; $guess <= $UPPER; $guess++) { next GUESS if $sieve[$guess]; print "$guess\n"; for (my $mults = $guess * 2; $mults <= $UPPER; $mults += $guess) { $sieve[$mults] = 1; } }
The upper bound of potential primes is needed in a few places, so I
set up the variable $UPPER
to hold it. This variable is uppercase,
the traditional indication that we're really looking at a read-only
variable established as a program-wide constant. Next, the @sieve
variable is initialized to contain the number 0 from $sieve[0]
to
$sieve[$UPPER]
. Because this list is one larger than the value of
$UPPER
, I need to add one. The x
operator replicates the list
on the left (just a 0
) enough times to make the @sieve
long
enough.
The GUESS
loop again tries potential candidates from 2
to the
upper limit. If $sieve[$guess]
is already set, it must have been
set on a prior candidate, so we're looking at a multiple of a
previously found prime number, so we'll just bounce on to the next
guess. Otherwise, this guess is printed, and the fun begins.
For all the multiples of this newly found prime, I'm walking through
the array, setting the value of that multiple to 1. Note that there's
no multiplication or division here (if you ignore the $guess *
2
, which I could have written as $guess + $guess
. That's what
makes this algorithm particularly CPU-friendly.
And, as I suspected, we're down under 2 CPU seconds here again. Upping
this program to the primes under a million puts me into the 15 CPU
second region. Additionally, I'm starting to page fault quite a bit,
because we're creating a million scalars in memory. Really, all
we needed was a single bit. Luckily, we can represent a bit-vector
directly, using the vec
operator:
my $UPPER = 1_000_000; my $sieve = ""; GUESS: for (my $guess = 2; $guess <= $UPPER; $guess++) { next GUESS if vec($sieve,$guess,1); print "$guess\n"; for (my $mults = $guess * 2; $mults <= $UPPER; $mults += $guess) { vec($sieve,$mults,1) = 1; } }
Here, $UPPER
is like the previous program, but we're now using a
single scalar $sieve
to hold the bit data. This scalar's string
value will be treated as a series of bits, initially all 0.
The rest of the program structure remains unchanged from the previous
version, but now we've changed the array operations to vec
operations. vec($sieve,$guess,1)
is true when the corresponding
single bit at position $guess
is true. And we can use the same
construct on the left side of an assignment to get that bit set or
cleared. Fairly slick, and fairly compact.
Oddly enough, we're still at 15 CPU seconds for this program, but the
pagefaults are now greatly reduced. This still seemed high, but then
I realized that I'm setting a bunch of already set bits. We don't
need to hit every multiple of $guess
, because any multiple that
was less than the square of $guess
had already been taken out earlier!
Fixing that, we get
my $UPPER = 1_000_000; my $sieve = ""; GUESS: for (my $guess = 2; $guess <= $UPPER; $guess++) { next GUESS if vec($sieve,$guess,1); print "$guess\n"; for (my $mults = $guess * $guess; $mults <= $UPPER; $mults += $guess) { vec($sieve,$mults,1) = 1; } }
which indeed reduces down to around 13 CPU seconds. Not a huge gain, but every bit helps as we get bigger. We're finding 78,498 primes, after all!
If you're curious about knowing a lot more about prime numbers
(perhaps more than you wanted to know), I stumbled across the web page
at http://www.utm.edu/research/primes/
, which seems to be a great
resource.
So, as you can now see, a first-year computer science problem can be solved handily by Perl, demonstrating that Perl is good for both theoretical things, and practical things. Until next time, enjoy!