Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 57 (Jan 2001)

[suggested title: Leveraging with open source]

The Open Source movement has been going strong for quite some time, starting long before the term had even been coined. In fact, some have argued that Perl and GNU Emacs are the canonical early success stories for Open Source, having helped spearhead the notion that a community of network-connected people can contribute to a publicly available tool to make it industry-strength and suitable for use in mission-critical commercial applications. If you haven't been overloaded by the buzzwords in that last sentence, read on.

One of the cool things I find about the Open Source movement is the willingness of the user community to ``give back''. Most people see Open Source projects as if they were a ``potluck picnic''. In a potluck, everyone tries to bring some dish, usually something they most enjoy making (or in my case, buying at the store already made, like chips), and from the individual contributions, we get to make a complete meal. Unless it's a bunch of geeks and everyone brought chips and nothing else. That's why geeks usually do potlucks with software (or a sufficient number of non-geeks for a complete meal) instead.

The advantage here is that I can leverage my contribution to the meal. I can bring one easy thing to purchase, and I end up getting a bit of this, a bit of that, and a more-or-less balanced meal, counting the three different kinds of desserts.

Similarly, with a software potluck, a few lines of code I ``bring'' can result in a complete program, by combining them with copies of the code in the libraries that others have brought, either in the distribution, or in the wonderful Comprehensive Perl Archive Network (CPAN) at search.cpan.org and hundreds of other places. Luckily, the software potluck comes with a built-in replicator, so I can take a lot more than I give and not feel very guilty.

So, while I was thinking about how Open Source has contributed to my work, I started wondering how many lines of code my programs actually used, given that I use a lot of modules in my programs. And just about then, as luck would have it, someone on Usenet posted about the Devel::Modlist module, a debugging aid to see which modules your program had pulled in. One of its features is to dump out a full path listing of all use'd modules, and that gave me the idea of checking out how many lines of code I pulled in for every line of code I wrote. And that brings us to the program in [listing one], to show how much leverage I've gotten from using Open Source libraries.

Lines 1 through 3 start nearly every program I do, enabling warnings, turning on compiler restrictions, and disabling buffering on STDOUT. If this were a CGI program I'd also add taint mode (-T on the first line), but it isn't.

Line 5 brings in the Config module, a standard module that lets us find out some common information about this particular Perl installation. It defines a %Config hash that we'll use later.

Line 6 pulls in the IPC::Open2 module, also a standard module. We'll need that to fire off a child process and babysit it, talking to both STDIN and STDOUT at the same time.

Line 7 uses the Memoize module, from the CPAN. This module enables a subroutine to be ``memoized'' (and no, that's not a misspelling). This means that successive calls to the subroutine with the same arguments will return the same result, but without reinvoking the body of the subroutine. The call to memoize then modifies the lines_in_file subroutine so that the results are automatically cached. This simplified the design of my program greatly, because I could just write what I wanted, and optimize later.

Line 8 pulls in the standard File::Basename module, particularly for the basename subroutine.

And now for the only configuration variable. Line 12 defines a pattern for glob matching all filenames to be processed. This is a path to my website's WebTechniques column archive, as viewed from the Unix side, not as a URL. The .listing.txt files are the source code for the various columns, up to col57.listing.txt, containing the source code for this month's column.

Lines 16 through 18 get information about the current Perl installation. First, we get the path to Perl into $perlpath. Next, we grab the two places things are installed: $privlib for distribution modules, and $sitelib for CPAN or local modules.

Line 20 create a regular expression object that matches files found either in $privlib or $sitelib. Hopefully. Worked for my installation, but your mileage may vary.

Line 22 loads up the @ARGV array (used for the diamond operator later) with the list of names matching the $PAT configuration variable. This results in the 57 filenames of the column code source listings.

Line 24 undefs the $/ variable, ensuring that each filehandle read returns all available data. This makes the diamond loop grab the entire file into $_ on each read, so that the loop is once per file not once per line.

Lines 26 and 27 hold the grand totals for original lines and used-module lines.

Lines 29 through 48 handle the main operation: processing one file at a time to see how many lines of modules versus the source lines. Each iteration looks at the next file, and has the entire contents of the file in $_, and the name of that file automatically in $ARGV as well.

Line 30 breaks apart a few of the listings with multiple programs in the same listing textfile. I wasn't consistent in naming the multiple parts, except for three pound characters at the beginning, and the word listing in either upper or lower case later in the same line. For this loop, $_ is now a single listing.

Line 31 skips over counting any program that has Apache:: as part of the source. Apparently, trying to use things that are meant to be used within mod_perl doesn't work very well, so I have to filter them out. But on first run, the addition of this line caused this program to not consider itself! So I added the character class brackets, which still matches Apache:: but doesn't look like Apache::. Nice trick.

Line 32 counts the number of source lines, and ignores the now empty items created by the earlier split.

Lines 33 and 34 launches a child Perl process to run the program, pulling in the Devel::Modlist module (found in the CPAN) and triggering the path output, giving us a list of all the use'd modules as their full paths, on STDERR of the child process. Because it's on STDERR, we need to merge that with STDOUT using the shell syntax to accomplish that. I've also enabled the ``compile only'' (-c) and ``taint mode'' (-T) options as well to keep from actually executing the code and to prevent the ``taint mode too late'' error.

Line 35 sends the program source code to the newly launched Perl process. Line 36 closes the input handle for that process. After a short time passes, data is available on RDR, which is read in line 37, and closed in line 38. Note that I know that the child Perl process isn't going to try to write more than 8K before I finish sending the program, so this operation is safe. (If it did try, we'd be both trying to shove data to each other, resulting in a staring match with nobody winning.)

Now it's time to see just how many things got used. We'll start with setting the total for this program to zero in line 39. Then for each output line, broken apart in line 40, we see if it's a pathname within a module in line 41. If so, we'll call lines_in_file for that filename in line 42. That returns the number of lines in that file, which gets summed into our total.

Finally in line 44, we'll dump the filename for this program, lines in the program, and total lines of modules sucked in.

Lines 45 and 46 sum the two counts into the grand totals.

And line 49 dumps out the grand totals at the bottom.

And now for the nice little subroutine lines_in_file starting in line 51. The only parameter gets saved in line 52 into $filename.

Line 53 creates a local filehandle. Starting in Perl 5.6, we can simplify this to just my $handle, but since I'm still using Perl 5.5.3 until 5.6.1 comes out, I'll do it in the more traditional way.

Line 54 opens the handle, returning 0 if something breaks. Line 55 reads the entire file into a buffer. And line 56 counts the lines in the same way we counted lines above: by changing all newlines into newlines and counting how many of those we did.

Now, this subroutine has been memoized above, meaning that even though it appears to open the file repeatedly as it has been seen in each program (think about how many times we're asked about strict.pm, for example), it's really only going to do this operation once per filename. The savings in a long-running program can be significant, however, the subroutine must have no side effects or chaos will ensue. See the Memoize module documentation for details.

And this results in something like:

    halfdome.holdit.com>> ./countlines
                 col01.listing.txt     79  12133
                 col02.listing.txt    108   8799
                 col03.listing.txt     36  15529
                 col04.listing.txt     75    104
                 col05.listing.txt    104  14596
                 col06.listing.txt     59   9416
                 col07.listing.txt     91  15529
                 col08.listing.txt     87    104
                 col09.listing.txt     95   1040
                 col10.listing.txt     41    104
                 col10.listing.txt     35   1223
                 col11.listing.txt    101  13229
                 col12.listing.txt     84   9081
                 col13.listing.txt     15      0
                 col13.listing.txt     45   4917
                 col14.listing.txt    167  10014
                 col15.listing.txt     63  11080
                 col16.listing.txt     74  11287
                 col17.listing.txt     91   1796
                 col18.listing.txt     61   3561
                 col19.listing.txt     93  16956
                 col20.listing.txt     88  11287
                 col21.listing.txt     90  11811
                 col22.listing.txt     98  14160
                 col23.listing.txt     58  18531
                 col24.listing.txt     77  18531
                 col25.listing.txt     33    104
                 col25.listing.txt     26   6848
                 col26.listing.txt     63   3561
                 col27.listing.txt    220  12218
                 col28.listing.txt     37   7943
                 col28.listing.txt     47  12218
                 col29.listing.txt     71   3445
                 col30.listing.txt    122  16697
                 col31.listing.txt     36   1223
                 col32.listing.txt    140  10952
                 col33.listing.txt    128   9502
                 col34.listing.txt    193  13229
                 col35.listing.txt    312  12104
                 col36.listing.txt     58   8912
                 col37.listing.txt    118  19999
                 col38.listing.txt     72   9081
                 col38.listing.txt     36   9081
                 col39.listing.txt     76  13068
                 col40.listing.txt    114   9081
                 col42.listing.txt    114  26458
                 col43.listing.txt    164  11666
                 col44.listing.txt     82   9081
                 col45.listing.txt     91   9420
                 col46.listing.txt     85   9081
                 col48.listing.txt    188  15597
                 col51.listing.txt    115   9081
                 col52.listing.txt     70   9736
                 col53.listing.txt     67  10603
                 col54.listing.txt     22  10209
                 col55.listing.txt      5      0
                 col55.listing.txt     41   1223
                 col55.listing.txt     25   9502
                 col56.listing.txt     95  17265
                 col57.listing.txt     57   8145
                    grand total =>   5138 571151
    halfdome.holdit.com>>

Lookee there. For the 5,138 lines of code I wrote (not counting some of the mod_perl stuff), I got a whopping 571,151 lines of code written by someone else, better than a hundred to one return on effort!

Now you can argue that not all of those 100 lines for every line I wrote are being used, and that if I had pulled out only the parts of those libraries actually used, it might be only 1/10th of that. But even then, that's a 10 to 1 ratio of code I write (and debug) versus code other people have written (and hopefully debugged).

I would dare argue that without Open Source, those CPAN modules would not have been as easily shared. And I think I'd find a lot of agreement there.

So, remember leverage. Learn the CPAN modules. And when you take, give back when you can. Contribute your cool reusable modules to the CPAN, and keep the potluck going. Until next time, enjoy!

Listings

        =1=     #!/usr/bin/perl -w
        =2=     use strict;
        =3=     $|++;
        =4=     
        =5=     use Config;
        =6=     use IPC::Open2;
        =7=     use Memoize; memoize('lines_in_file');
        =8=     use File::Basename;
        =9=     
        =10=    ## CONFIG
        =11=    
        =12=    my $PAT = "/home/merlyn/Html/merlyn/WebTechniques/col??.listing.txt";
        =13=    
        =14=    ## END CONFIG
        =15=    
        =16=    my $perlpath = $Config{perlpath};
        =17=    my $privlib = $Config{privlib};
        =18=    my $sitelib = $Config{sitelib};
        =19=    
        =20=    my $files_regex = qr/^(\Q$privlib\E|\Q$sitelib\E)/;
        =21=    
        =22=    @ARGV = glob $PAT or die "no files?";
        =23=    
        =24=    undef $/;
        =25=    
        =26=    my $source_grand = 0;
        =27=    my $used_grand = 0;
        =28=    
        =29=    while (<>) {
        =30=      for (split /^\#\#\#.*listing.*\n/im) {
        =31=        next if /Apach[e]::/; # bleh
        =32=        next unless my $source_count = tr/\n//;
        =33=        open2(\*RDR, \*WTR, "$perlpath -cTMDevel::Modlist=path 2>&1")
        =34=          or die "Cannot create pipe or fork or something: $!";
        =35=        print WTR $_;
        =36=        close WTR;
        =37=        $_ = <RDR>;
        =38=        close RDR;
        =39=        my $used_count = 0;
        =40=        for (split /\n/) {
        =41=          next unless /$files_regex/;
        =42=          $used_count += lines_in_file($_);
        =43=        }
        =44=        printf "%30s %6d %6d\n", basename($ARGV), $source_count, $used_count;
        =45=        $source_grand += $source_count;
        =46=        $used_grand += $used_count;
        =47=      }
        =48=    }
        =49=    printf "%30s %6d %6d\n", "grand total =>", $source_grand, $used_grand;
        =50=    
        =51=    sub lines_in_file {
        =52=      my $filename = shift;
        =53=      my $handle = \do { local *STDIN };
        =54=      open $handle, "<$filename" or return 0;
        =55=      read $handle, my $buffer, -s $handle;
        =56=      $buffer =~ tr/\n//;
        =57=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.