Copyright Notice
This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Unix Review Column 53 (Jul 2004)
[suggested title: ``Lightweight Persistent Data'']
Frequently, you have data with a strong will to live. That is, your data must persist between invocations of your program, and occasionally even be shared between simultaneous invocations.
At the high-end of this demand, we have entire companies devoted to
creating high-performance multi-user SQL-interfaced databases. These
databases are usually accessed from Perl via the DBI
package, or by
some wrapper slightly above DBI
, such as Class::DBI
or
DBIx::SQLEngine
. The details of SQL might even be entirely hidden
away using a higher-level package like Tangram
or Alzabo
.
But further down the scale, there are some new solutions popping on to
the scene which invite further observation, as well as some old
classic solutions. For example, since Perl version 2 we've been able
to put a hash out on disk with dbmopen
:
dbmopen(%HASH, "/path/on/disk", 0644) || die; $HASH{"key"} = "value"; dbmclose(%HASH);
The effect of such code is that we now have a key/value pair stored in an external structured file. We can later come along and reopen the database as a hash again, and treat it as if it was a hash with preexisting values:
dbmopen(%HASH, "/path/on/disk", 0644) || die; foreach $key (sort keys %HASH) { print "$key => $HASH{$key}\n"; } dbmclose(%HASH);
While the interface was relatively simple, I wrote quite a few programs before Perl5 came around using this storage mechanism for my persistence. However, this storage suffered some limitations: the keys and values having to be under a given size, access to the structure could not handle multi-user reads and writes, and the resulting data files were not necessarily portable to other machines (because they used incompatible libraries or byte orders).
When Perl5 came long, new problems arose. No longer were we limited
to just arrays and hashes, but we could now have complex data types
with arbitary structure. Luckily, the mechanism ``behind'' the
dbmopen
was made available directly at the Perl code level, through
the tie
operator, described in the perltie
manpage. This let
others beside Larry Wall create ``magical'' hashes that could perform
actions on every fetch and store.
One early use of the tie
mechanism was the MLDBM
package, which
could take a complex value to be assigned for a given key, and
serialize it to a single string value which could then be stored
much like before. For example:
use MLDBM; tie my %hash, 'MLDBM' or die; $hash{my_array} = [1..5]; $hash{my_scores} = { fred => 205, barney => 195, dino => 30 };
As each complex data structure was stored into the hash, it gets
converted into a string, using Data::Dumper
, FreezeThaw
, or
Storable
. If a value was fetched, it would be converted back
from a string to the complex data structure. However, the resulting
value was no longer related to the tied hash. For example:
my $scores = $hash{my_scores}; $scores->{fred} = 215;
would no longer affect the stored data. Instead, we got warnings on
the MLDBM
manpage to ``not do this''. Also, we still had all the
limitations of a standard dbmopen
-style database: size limits,
multiuser access, and non-portability.
One solution that I resorted to on more than one occasion was to take
over the serialization myself, and to use Storable
's retrieve
and nstore
operations directly. My code would look something like:
use Storable qw(nstore retrieve); my $data = retrieve('file'); ... perform operations with $data ... nstore $data, 'file';
Now my $data
value could be an arbitrarily complex data structure,
and any changes I made would be completely reflected in the updated
file. The result was that I simply had a Perl data strucure that
persisted.
It appears that the author of Tie::Persistent
had the same idea to
use Storable
on the entire top-level structure as well, except with
a tie
wrapper instead of explicit fetch-store phases, although I
can't vouch for the code. In fact, I see a number of CPAN entries
that all seemed to find similar mechanisms, but none of them seemed to
have found the ``holy grail'' of object persistence, making it as
absolutely transparent as possible in a nice portable (and hopefully
multiuser) manner.
That is, until I noticed DBM::Deep
. According to the Changelog,
this distribution has been around for about two years (as I write
this), but only on the CPAN for a few months. From its own description:
- DESCRIPTION
-
A unique flat-file database module, written in pure perl. True multi- level hash/array support (unlike MLDBM, which is faked), hybrid OO /
tie()
interface, cross-platform FTPable files, and quite fast. Can handle millions of keys and unlimited hash levels without significant slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and Windows.
And with a promotional paragraph like that, I just had to look. It looks simple enough. I merely say:
use DBM::Deep; my $hash = DBM::Deep->new("foo.db"); $hash->{my_array} = [1..5]; $hash->{my_scores} = { fred => 205, barney => 195, dino => 30 };
And that's it. In my next program:
use DBM::Deep; my $hash = DBM::Deep->new("foo.db"); $hash->{my_scores}->{fred} = 215; # update score
And finally, retrieving it all:
use DBM::Deep; my $hash = DBM::Deep->new("foo.db"); print join(", ",@{$hash->{my_array}}), "\n"; for (sort keys %{$hash->{my_scores}}) { print "$_ => $hash->{my_scores}->{$_}\n"; }
which prints:
1, 2, 3, 4, 5 barney => 195 dino => 30 fred => 215
And in fact, that all just plain worked. I'm impressed. We've avoided the MLDBM problem, because the update to the nested data worked. And, there's no dependency on traditional DBMs here, so there's no size limitation or byte ordering, or even the need for a C compiler to install.
I'm told, although I haven't tested it, that I can also add:
$hash->lock; ... do some shared things ... $hash->unlock;
and thereby access shared data in multiple processes.
There also seems to be some cool stuff around encrypting or compressing the data as well. This definitely bears further examination.
The limitations of DBM::Deep
seem rather expected. Because this is
a single data file, it's being locked using flock
, so we can't
persist data for multiple users across machines or reliably across
NFS
. Also, we have to clean up after ourselves from time to time
by calling an optimize
method: otherwise, unused space starts
accumulating in the database.
One other recent addition to the CPAN also caught my eye: OOPS
.
Unlike DBM::Deep
, OOPS
uses a DBI-style database (currently only
compatible with PostgreSQL, MySQL, and SQLite) for its persistent
store. However, like DBM::Deep
, once a connection is made, you
pretty much do anything you want with the data structure, and it gets
reflected into the permanent storage. The database tables are created
on request, and managed by the module transparently.
The basic mode of OOPS
looks like:
use OOPS; transaction(sub { OOPS->initial_setup( dbi_dsn => 'dbi:SQLite:/tmp/oops', username => undef, # no matter with SQLite password => undef, # ditto ) unless -s "/tmp/oops";
my $hash = OOPS->new( dbi_dsn => 'dbi:SQLite:/tmp/oops', username => undef, # no matter with SQLite password => undef, # ditto );
$hash->{my_array} = [1..5]; $hash->{my_scores} = { fred => 205, barney => 195, dino => 30 }; $hash->{my_scores}->{fred} = 215; # update score
$hash->commit; });
The wrapper of transaction
forces this update to all be within
a single transaction. We fetch the data similarly:
use OOPS; transaction(sub { my $hash = OOPS->new( dbi_dsn => 'dbi:SQLite:/tmp/oops', username => undef, # no matter with SQLite password => undef, # ditto );
print join(", ",@{$hash->{my_array}}), "\n"; for (sort keys %{$hash->{my_scores}}) { print "$_ => $hash->{my_scores}->{$_}\n"; } });
And in fact, this retrieved exactly the values I had expected. I'll be exploring these two modules in greater depth in the future, and until then, enjoy!