Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 24 (Feb 1999)

Every invocation of a Perl program starts the world afresh, with new data to be created, brought in, and manipulated. However, some things in the real world must persist. So, how do we keep data around, to access it later, perhaps in another invocation of the same program, or even a different program?

One of the simplest ways is to transfer a single array to and from a simple ``one item per line'' text file. For example, at the beginning of your program, you'd load the array:

    open DB, "<file" or die "open: $!";
    @data = <DB>;
    close DB;

and then use @data as a list of lines. You can insert and delete lines, or add new items in any way you wish. When you're done, just write it back out:

    open DB, ">file" or die "create: $!";
    print DB @data;
    close DB;

One of the limitations of this method is that you can save only a single array easily to a given file. Also, the data has to be well behaved: containing no newlines, and delimited by a single newline.

You can similarly save a hash to a file, using a distinct delimiter between the keys and values. Let's use a tab character:

    open DB, "<file" or die "open: $!";
    %mydata = map /(.*)\t(.*)/, <DB>;
    close DB;

Here we've used a map operation to extract the keys and corresponding values from each line of the permanent storage. At the end of the program, we'll simply reverse the process:

    open DB, ">file" or die "create: $!";
    print DB map "$_\t$mydata{$_}\n",
        keys %mydata;
    close DB;

Once again, the map operation makes it easy. We still suffer the limitation of having to have well-behaved data: no tabs or newlines within the keys or values.

For nearly arbitrary keys and values, we can step up to the next level of sophistication: DBMs. A DBM is a small database created with the DBM routines shipped with nearly every version of Unix since the mid-70's. The database consists of key-value pairs of arbitrary binary data, limited to about 1K for each combined key-value pair.

Perl accesses DBMs through the use of a DBM hash, created like so:

    dbmopen %DB, "my_database", 0644
        or die "dbmopen: $!";

From then on in the program, any access to the hash %DB will automatically result in a database fetch or store, as appropriate. The database is kept in files my_database.dir and my_database.pag in the current directory, created mode 644 (octal) if not already present. For example, let's put some random data in the database:

    for (1..10) {
        $DB{"item $_"} = rand(1000);
    }

Each write to %DB here is immediately reflected in the database files. Later on, we can reopen the database with a later invocation of the program (or even a different program):

    for (keys %DB) {
        print "$_ => $DB{$_}\n";
    }

and you'll see the same random numbers that got stored earlier. Nearly everything that works on a standard hash will work on a DBM hash. However, the keys and values are limited in length, and the value of undef doesn't store (it comes back as the empty string).

Another more serious limitation is that the values of a DBM hash are all strings, and thus cannot contain fancier data like references. For that, we need to a little more sophisticated. One simple approach is to extend the power of a DBM using the MLDBM package found in the CPAN (mirrored at http://www.cpan.org and dozens of other locations around the world).

The MLDBM package uses DBM technology, but automatically serializes each value, so that even if it's a reference to a data structure, the entire data structure can be stored safely in the DBM. It looks like this:

    use MLDBM; # do this once
    use Fcntl; # also once

    tie %DB, MLDBM, "my_database",
        O_RDWR | O_CREAT, 0644;

This is similar to the dbmopen call above, but now the values (with a few limitations) can be nearly any data structure mixture of listrefs and hashrefs.

We're still stuck with less than 1K of keys and values however. So, with a slightly bigger hammer, we can drive in a bigger nail. The bigger hammer in this case is the Data::Dumper module (also found in the CPAN). This module can take a reference to a data structure and generate Perl code that accurately reproduces the original data structure, even if the data is complicated.

Here's how it works. First, bring in the module at the beginning of the program:

    use Data::Dumper;

then to save the data pointed to by $data, open a text file and write the results of a Data::Dumper dump on that data:

    open DB, ">file" or die "create: $!";
    print Data::Dumper->
        Dump([$data], [qw($data)]);
    close DB;

The text in file at this point is real Perl code that can be evaluated to restore the value of $data to whatever it had at the time of the dump. It's also a fairly readable (and nicely indented) description of the data, so you can also use this module to debug your complex data structures.

To reload $data at the next invocation, just do it: that is, evaluate the file as Perl code:

    do 'file';

and the value of $data is restored. You can put multiple data structures in the same file. See the documentation for more details.

One of the downside of Data::Dumper is that it cannot store coderefs, or precisely record floating point numbers, since everything must be written out as a string. Another downside is that the cost of creating a lot of human-readable code, just to have it be parsed by Perl again, is actually somewhat expensive.

There's a more powerful and faster way to save arbitrary data structures through the use of the Storable module (also found in the CPAN). This module also organizes a data structure as a series of bytes, but instead of writing human-readable text, it writes data in a format known only to the readers of the source code for the module. Generally, that's not an issue, because we'll be using Storable to both store and retrieve the value.

The usage is actually somewhat easier than Data::Dumper:

    use Storable;
    store $data, "file";

Wow. I didn't even have to open the file, because the module does it for me. And now to retrieve it...

    $data = retrieve "file";

Yup. That's it. With a little work, we can even make this happen automatically at the end of the program, by creating a tiny object with a DESTROY method:

    package Persist;
    use Storable;
    sub TIESCALAR {
        my $self = bless {}, shift;
        my $file = shift;
        $self->{File} = $file;
        $self->{Value} = -e $file ?
            retrieve $file : undef;
        $self;
    }
    sub FETCH {
        shift->{Value};
    }
    sub STORE {
        my $self = shift;
        $self->{Value} = shift;
    }
    sub DESTROY {
        my $self = shift;
        store $self->{Value}, $self->{File};
    }

    # ... and down in the main code ...

    package main;
    tie $data, Persist, 'my_persistent_file';
    $data->{"Fred" . time} = "barney $$";
    $data->{"Wilma" . time} =
         ['pebbles', 'bamm-bamm', time, $$];
    ## to see the structure (not for saving):
    use Data::Dumper;
    print Data::Dumper->Dump([$data], [qw($data)]);

And when the program exits, it'll automatically write out $data to the file, such that when you rerun the program, $data will be automatically restored! This works because $data has been tied to the Persist package, so all fetches and stores automatically call the underlying methods. We'll get notification of impending doom at the end of the program (or as the variable goes out of scope) thanks to the DESTROY routine.

This is just a skeleton of the structure. You'll probably want to add some error checking. You could even put hooks in the TIESCALAR method to automatically flock the file if it exists, to ensure that you're the only one updating the data, and then release the flock when the variable is destroyed. Cool.

I hope you've enjoyed this little excursion into the world of Perl's data persistance. Sometimes, you can take it with you.

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 24 (Feb 1999)