Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
![]() |
Download this listing! | ![]() |
![]() |
![]() |
Web Techniques Column 57 (Jan 2001)
[suggested title: Leveraging with open source]
The Open Source movement has been going strong for quite some time, starting long before the term had even been coined. In fact, some have argued that Perl and GNU Emacs are the canonical early success stories for Open Source, having helped spearhead the notion that a community of network-connected people can contribute to a publicly available tool to make it industry-strength and suitable for use in mission-critical commercial applications. If you haven't been overloaded by the buzzwords in that last sentence, read on.
One of the cool things I find about the Open Source movement is the willingness of the user community to ``give back''. Most people see Open Source projects as if they were a ``potluck picnic''. In a potluck, everyone tries to bring some dish, usually something they most enjoy making (or in my case, buying at the store already made, like chips), and from the individual contributions, we get to make a complete meal. Unless it's a bunch of geeks and everyone brought chips and nothing else. That's why geeks usually do potlucks with software (or a sufficient number of non-geeks for a complete meal) instead.
The advantage here is that I can leverage my contribution to the meal. I can bring one easy thing to purchase, and I end up getting a bit of this, a bit of that, and a more-or-less balanced meal, counting the three different kinds of desserts.
Similarly, with a software potluck, a few lines of code I ``bring'' can
result in a complete program, by combining them with copies of the
code in the libraries that others have brought, either in the
distribution, or in the wonderful Comprehensive Perl Archive Network
(CPAN) at search.cpan.org
and hundreds of other places. Luckily,
the software potluck comes with a built-in replicator, so I can take a
lot more than I give and not feel very guilty.
So, while I was thinking about how Open Source has contributed to my
work, I started wondering how many lines of code my programs actually
used, given that I use a lot of modules in my programs. And just
about then, as luck would have it, someone on Usenet posted about the
Devel::Modlist
module, a debugging aid to see which modules your
program had pulled in. One of its features is to dump out a full path
listing of all use
'd modules, and that gave me the idea of checking
out how many lines of code I pulled in for every line of code I wrote.
And that brings us to the program in [listing one], to show how much
leverage I've gotten from using Open Source libraries.
Lines 1 through 3 start nearly every program I do, enabling warnings,
turning on compiler restrictions, and disabling buffering on STDOUT
.
If this were a CGI program I'd also add taint mode (-T
on the first
line), but it isn't.
Line 5 brings in the Config
module, a standard module that lets us
find out some common information about this particular Perl
installation. It defines a %Config
hash that we'll use later.
Line 6 pulls in the IPC::Open2
module, also a standard module.
We'll need that to fire off a child process and babysit it, talking to
both STDIN
and STDOUT
at the same time.
Line 7 uses the Memoize
module, from the CPAN. This module enables
a subroutine to be ``memoized'' (and no, that's not a misspelling).
This means that successive calls to the subroutine with the same
arguments will return the same result, but without reinvoking the body
of the subroutine. The call to memoize
then modifies the
lines_in_file
subroutine so that the results are automatically
cached. This simplified the design of my program greatly, because I
could just write what I wanted, and optimize later.
Line 8 pulls in the standard File::Basename
module, particularly
for the basename
subroutine.
And now for the only configuration variable. Line 12 defines a
pattern for glob
matching all filenames to be processed. This is a
path to my website's WebTechniques column archive, as viewed from
the Unix side, not as a URL. The .listing.txt
files are the source
code for the various columns, up to col57.listing.txt
, containing
the source code for this month's column.
Lines 16 through 18 get information about the current Perl
installation. First, we get the path to Perl into $perlpath
.
Next, we grab the two places things are installed: $privlib
for
distribution modules, and $sitelib
for CPAN or local modules.
Line 20 create a regular expression object that matches files found
either in $privlib
or $sitelib
. Hopefully. Worked for my
installation, but your mileage may vary.
Line 22 loads up the @ARGV
array (used for the diamond operator
later) with the list of names matching the $PAT
configuration
variable. This results in the 57 filenames of the column code source
listings.
Line 24 undefs the $/
variable, ensuring that each filehandle read
returns all available data. This makes the diamond loop grab the
entire file into $_
on each read, so that the loop is once
per file not once per line.
Lines 26 and 27 hold the grand totals for original lines and used-module lines.
Lines 29 through 48 handle the main operation: processing one file at
a time to see how many lines of modules versus the source lines. Each
iteration looks at the next file, and has the entire contents of the
file in $_
, and the name of that file automatically in $ARGV
as
well.
Line 30 breaks apart a few of the listings with multiple programs in
the same listing textfile. I wasn't consistent in naming the multiple
parts, except for three pound characters at the beginning, and the
word listing
in either upper or lower case later in the same line.
For this loop, $_
is now a single listing.
Line 31 skips over counting any program that has Apache::
as part
of the source. Apparently, trying to use
things that are meant
to be used within mod_perl
doesn't work very well, so I have to
filter them out. But on first run, the addition of this line
caused this program to not consider itself! So I added the character
class brackets, which still matches Apache::
but doesn't look
like Apache::
. Nice trick.
Line 32 counts the number of source lines, and ignores the
now empty items created by the earlier split
.
Lines 33 and 34 launches a child Perl process to run the program,
pulling in the Devel::Modlist
module (found in the CPAN) and
triggering the path
output, giving us a list of all the use
'd
modules as their full paths, on STDERR
of the child process.
Because it's on STDERR
, we need to merge that with STDOUT
using
the shell syntax to accomplish that. I've also enabled the ``compile
only'' (-c
) and ``taint mode'' (-T
) options as well to keep from
actually executing the code and to prevent the ``taint mode too late''
error.
Line 35 sends the program source code to the newly launched Perl
process. Line 36 closes the input handle for that process. After a
short time passes, data is available on RDR
, which is read in line
37, and closed in line 38. Note that I know that the child Perl
process isn't going to try to write more than 8K before I finish
sending the program, so this operation is safe. (If it did try, we'd
be both trying to shove data to each other, resulting in a staring
match with nobody winning.)
Now it's time to see just how many things got used. We'll start with
setting the total for this program to zero in line 39. Then for each
output line, broken apart in line 40, we see if it's a pathname within
a module in line 41. If so, we'll call lines_in_file
for that
filename in line 42. That returns the number of lines in that file,
which gets summed into our total.
Finally in line 44, we'll dump the filename for this program, lines in the program, and total lines of modules sucked in.
Lines 45 and 46 sum the two counts into the grand totals.
And line 49 dumps out the grand totals at the bottom.
And now for the nice little subroutine lines_in_file
starting in
line 51. The only parameter gets saved in line 52 into $filename
.
Line 53 creates a local filehandle. Starting in Perl 5.6, we can
simplify this to just my $handle
, but since I'm still using Perl
5.5.3 until 5.6.1 comes out, I'll do it in the more traditional way.
Line 54 opens the handle, returning 0 if something breaks. Line 55 reads the entire file into a buffer. And line 56 counts the lines in the same way we counted lines above: by changing all newlines into newlines and counting how many of those we did.
Now, this subroutine has been memoized above, meaning that even
though it appears to open the file repeatedly as it has been seen in
each program (think about how many times we're asked about
strict.pm
, for example), it's really only going to do this
operation once per filename. The savings in a long-running program
can be significant, however, the subroutine must have no side effects
or chaos will ensue. See the Memoize
module documentation for
details.
And this results in something like:
halfdome.holdit.com>> ./countlines col01.listing.txt 79 12133 col02.listing.txt 108 8799 col03.listing.txt 36 15529 col04.listing.txt 75 104 col05.listing.txt 104 14596 col06.listing.txt 59 9416 col07.listing.txt 91 15529 col08.listing.txt 87 104 col09.listing.txt 95 1040 col10.listing.txt 41 104 col10.listing.txt 35 1223 col11.listing.txt 101 13229 col12.listing.txt 84 9081 col13.listing.txt 15 0 col13.listing.txt 45 4917 col14.listing.txt 167 10014 col15.listing.txt 63 11080 col16.listing.txt 74 11287 col17.listing.txt 91 1796 col18.listing.txt 61 3561 col19.listing.txt 93 16956 col20.listing.txt 88 11287 col21.listing.txt 90 11811 col22.listing.txt 98 14160 col23.listing.txt 58 18531 col24.listing.txt 77 18531 col25.listing.txt 33 104 col25.listing.txt 26 6848 col26.listing.txt 63 3561 col27.listing.txt 220 12218 col28.listing.txt 37 7943 col28.listing.txt 47 12218 col29.listing.txt 71 3445 col30.listing.txt 122 16697 col31.listing.txt 36 1223 col32.listing.txt 140 10952 col33.listing.txt 128 9502 col34.listing.txt 193 13229 col35.listing.txt 312 12104 col36.listing.txt 58 8912 col37.listing.txt 118 19999 col38.listing.txt 72 9081 col38.listing.txt 36 9081 col39.listing.txt 76 13068 col40.listing.txt 114 9081 col42.listing.txt 114 26458 col43.listing.txt 164 11666 col44.listing.txt 82 9081 col45.listing.txt 91 9420 col46.listing.txt 85 9081 col48.listing.txt 188 15597 col51.listing.txt 115 9081 col52.listing.txt 70 9736 col53.listing.txt 67 10603 col54.listing.txt 22 10209 col55.listing.txt 5 0 col55.listing.txt 41 1223 col55.listing.txt 25 9502 col56.listing.txt 95 17265 col57.listing.txt 57 8145 grand total => 5138 571151 halfdome.holdit.com>>
Lookee there. For the 5,138 lines of code I wrote (not counting some
of the mod_perl
stuff), I got a whopping 571,151 lines of code
written by someone else, better than a hundred to one return on
effort!
Now you can argue that not all of those 100 lines for every line I wrote are being used, and that if I had pulled out only the parts of those libraries actually used, it might be only 1/10th of that. But even then, that's a 10 to 1 ratio of code I write (and debug) versus code other people have written (and hopefully debugged).
I would dare argue that without Open Source, those CPAN modules would not have been as easily shared. And I think I'd find a lot of agreement there.
So, remember leverage. Learn the CPAN modules. And when you take, give back when you can. Contribute your cool reusable modules to the CPAN, and keep the potluck going. Until next time, enjoy!
Listings
=1= #!/usr/bin/perl -w =2= use strict; =3= $|++; =4= =5= use Config; =6= use IPC::Open2; =7= use Memoize; memoize('lines_in_file'); =8= use File::Basename; =9= =10= ## CONFIG =11= =12= my $PAT = "/home/merlyn/Html/merlyn/WebTechniques/col??.listing.txt"; =13= =14= ## END CONFIG =15= =16= my $perlpath = $Config{perlpath}; =17= my $privlib = $Config{privlib}; =18= my $sitelib = $Config{sitelib}; =19= =20= my $files_regex = qr/^(\Q$privlib\E|\Q$sitelib\E)/; =21= =22= @ARGV = glob $PAT or die "no files?"; =23= =24= undef $/; =25= =26= my $source_grand = 0; =27= my $used_grand = 0; =28= =29= while (<>) { =30= for (split /^\#\#\#.*listing.*\n/im) { =31= next if /Apach[e]::/; # bleh =32= next unless my $source_count = tr/\n//; =33= open2(\*RDR, \*WTR, "$perlpath -cTMDevel::Modlist=path 2>&1") =34= or die "Cannot create pipe or fork or something: $!"; =35= print WTR $_; =36= close WTR; =37= $_ = <RDR>; =38= close RDR; =39= my $used_count = 0; =40= for (split /\n/) { =41= next unless /$files_regex/; =42= $used_count += lines_in_file($_); =43= } =44= printf "%30s %6d %6d\n", basename($ARGV), $source_count, $used_count; =45= $source_grand += $source_count; =46= $used_grand += $used_count; =47= } =48= } =49= printf "%30s %6d %6d\n", "grand total =>", $source_grand, $used_grand; =50= =51= sub lines_in_file { =52= my $filename = shift; =53= my $handle = \do { local *STDIN }; =54= open $handle, "<$filename" or return 0; =55= read $handle, my $buffer, -s $handle; =56= $buffer =~ tr/\n//; =57= }