Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 24 (Feb 1999)
Every invocation of a Perl program starts the world afresh, with new data to be created, brought in, and manipulated. However, some things in the real world must persist. So, how do we keep data around, to access it later, perhaps in another invocation of the same program, or even a different program?
One of the simplest ways is to transfer a single array to and from a simple ``one item per line'' text file. For example, at the beginning of your program, you'd load the array:
open DB, "<file" or die "open: $!"; @data = <DB>; close DB;
and then use @data
as a list of lines. You can insert and
delete lines, or add new items in any way you wish. When you're
done, just write it back out:
open DB, ">file" or die "create: $!"; print DB @data; close DB;
One of the limitations of this method is that you can save only a single array easily to a given file. Also, the data has to be well behaved: containing no newlines, and delimited by a single newline.
You can similarly save a hash to a file, using a distinct delimiter between the keys and values. Let's use a tab character:
open DB, "<file" or die "open: $!"; %mydata = map /(.*)\t(.*)/, <DB>; close DB;
Here we've used a map
operation to extract the keys and
corresponding values from each line of the permanent storage. At the
end of the program, we'll simply reverse the process:
open DB, ">file" or die "create: $!"; print DB map "$_\t$mydata{$_}\n", keys %mydata; close DB;
Once again, the map
operation makes it easy. We still suffer the
limitation of having to have well-behaved data: no tabs or newlines
within the keys or values.
For nearly arbitrary keys and values, we can step up to the next level of sophistication: DBMs. A DBM is a small database created with the DBM routines shipped with nearly every version of Unix since the mid-70's. The database consists of key-value pairs of arbitrary binary data, limited to about 1K for each combined key-value pair.
Perl accesses DBMs through the use of a DBM hash, created like so:
dbmopen %DB, "my_database", 0644 or die "dbmopen: $!";
From then on in the program, any access to the hash %DB
will
automatically result in a database fetch or store, as appropriate.
The database is kept in files my_database.dir
and
my_database.pag
in the current directory, created mode 644 (octal)
if not already present. For example, let's put some random data in
the database:
for (1..10) { $DB{"item $_"} = rand(1000); }
Each write to %DB
here is immediately reflected in the database
files. Later on, we can reopen the database with a later invocation
of the program (or even a different program):
for (keys %DB) { print "$_ => $DB{$_}\n"; }
and you'll see the same random numbers that got stored earlier.
Nearly everything that works on a standard hash will work on a DBM
hash. However, the keys and values are limited in length, and the
value of undef
doesn't store (it comes back as the empty string).
Another more serious limitation is that the values of a DBM hash are
all strings, and thus cannot contain fancier data like references.
For that, we need to a little more sophisticated. One simple approach
is to extend the power of a DBM using the MLDBM
package found in
the CPAN (mirrored at http://www.cpan.org
and dozens of other
locations around the world).
The MLDBM
package uses DBM technology, but automatically
serializes each value, so that even if it's a reference to a data
structure, the entire data structure can be stored safely in the DBM.
It looks like this:
use MLDBM; # do this once use Fcntl; # also once
tie %DB, MLDBM, "my_database", O_RDWR | O_CREAT, 0644;
This is similar to the dbmopen
call above, but now the values
(with a few limitations) can be nearly any data structure mixture
of listrefs and hashrefs.
We're still stuck with less than 1K of keys and values however. So,
with a slightly bigger hammer, we can drive in a bigger nail. The
bigger hammer in this case is the Data::Dumper
module (also found
in the CPAN). This module can take a reference to a data structure
and generate Perl code that accurately reproduces the original data
structure, even if the data is complicated.
Here's how it works. First, bring in the module at the beginning of the program:
use Data::Dumper;
then to save the data pointed to by $data
, open a text file and
write the results of a Data::Dumper
dump on that data:
open DB, ">file" or die "create: $!"; print Data::Dumper-> Dump([$data], [qw($data)]); close DB;
The text in file
at this point is real Perl code that can be
evaluated to restore the value of $data
to whatever it had at the
time of the dump. It's also a fairly readable (and nicely indented)
description of the data, so you can also use this module to debug your
complex data structures.
To reload $data
at the next invocation, just do
it: that is,
evaluate the file as Perl code:
do 'file';
and the value of $data
is restored. You can put multiple data
structures in the same file. See the documentation for more details.
One of the downside of Data::Dumper
is that it cannot store
coderefs, or precisely record floating point numbers, since everything
must be written out as a string. Another downside is that the
cost of creating a lot of human-readable code, just to have it
be parsed by Perl again, is actually somewhat expensive.
There's a more powerful and faster way to save arbitrary data
structures through the use of the Storable
module (also found in
the CPAN). This module also organizes a data structure as a series of
bytes, but instead of writing human-readable text, it writes data in a
format known only to the readers of the source code for the module.
Generally, that's not an issue, because we'll be using Storable
to
both store and retrieve the value.
The usage is actually somewhat easier than Data::Dumper
:
use Storable; store $data, "file";
Wow. I didn't even have to open the file, because the module does it for me. And now to retrieve it...
$data = retrieve "file";
Yup. That's it. With a little work, we can even make this happen automatically at the end of the program, by creating a tiny object with a DESTROY method:
package Persist; use Storable; sub TIESCALAR { my $self = bless {}, shift; my $file = shift; $self->{File} = $file; $self->{Value} = -e $file ? retrieve $file : undef; $self; } sub FETCH { shift->{Value}; } sub STORE { my $self = shift; $self->{Value} = shift; } sub DESTROY { my $self = shift; store $self->{Value}, $self->{File}; }
# ... and down in the main code ...
package main; tie $data, Persist, 'my_persistent_file'; $data->{"Fred" . time} = "barney $$"; $data->{"Wilma" . time} = ['pebbles', 'bamm-bamm', time, $$]; ## to see the structure (not for saving): use Data::Dumper; print Data::Dumper->Dump([$data], [qw($data)]);
And when the program exits, it'll automatically write out $data
to
the file, such that when you rerun the program, $data
will be
automatically restored! This works because $data
has been tied to
the Persist
package, so all fetches and stores automatically call
the underlying methods. We'll get notification of impending doom at
the end of the program (or as the variable goes out of scope) thanks
to the DESTROY
routine.
This is just a skeleton of the structure. You'll probably want to add
some error checking. You could even put hooks in the TIESCALAR
method to automatically flock
the file if it exists, to ensure that
you're the only one updating the data, and then release the flock
when the variable is destroyed. Cool.
I hope you've enjoyed this little excursion into the world of Perl's data persistance. Sometimes, you can take it with you.