Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 53 (Jul 2004)

[suggested title: ``Lightweight Persistent Data'']

Frequently, you have data with a strong will to live. That is, your data must persist between invocations of your program, and occasionally even be shared between simultaneous invocations.

At the high-end of this demand, we have entire companies devoted to creating high-performance multi-user SQL-interfaced databases. These databases are usually accessed from Perl via the DBI package, or by some wrapper slightly above DBI, such as Class::DBI or DBIx::SQLEngine. The details of SQL might even be entirely hidden away using a higher-level package like Tangram or Alzabo.

But further down the scale, there are some new solutions popping on to the scene which invite further observation, as well as some old classic solutions. For example, since Perl version 2 we've been able to put a hash out on disk with dbmopen:

  dbmopen(%HASH, "/path/on/disk", 0644) || die;
  $HASH{"key"} = "value";
  dbmclose(%HASH);

The effect of such code is that we now have a key/value pair stored in an external structured file. We can later come along and reopen the database as a hash again, and treat it as if it was a hash with preexisting values:

  dbmopen(%HASH, "/path/on/disk", 0644) || die;
  foreach $key (sort keys %HASH) {
    print "$key => $HASH{$key}\n";
  }
  dbmclose(%HASH);

While the interface was relatively simple, I wrote quite a few programs before Perl5 came around using this storage mechanism for my persistence. However, this storage suffered some limitations: the keys and values having to be under a given size, access to the structure could not handle multi-user reads and writes, and the resulting data files were not necessarily portable to other machines (because they used incompatible libraries or byte orders).

When Perl5 came long, new problems arose. No longer were we limited to just arrays and hashes, but we could now have complex data types with arbitary structure. Luckily, the mechanism ``behind'' the dbmopen was made available directly at the Perl code level, through the tie operator, described in the perltie manpage. This let others beside Larry Wall create ``magical'' hashes that could perform actions on every fetch and store.

One early use of the tie mechanism was the MLDBM package, which could take a complex value to be assigned for a given key, and serialize it to a single string value which could then be stored much like before. For example:

  use MLDBM;
  tie my %hash, 'MLDBM' or die;
  $hash{my_array} = [1..5];
  $hash{my_scores} = { fred => 205, barney => 195, dino => 30 };

As each complex data structure was stored into the hash, it gets converted into a string, using Data::Dumper, FreezeThaw, or Storable. If a value was fetched, it would be converted back from a string to the complex data structure. However, the resulting value was no longer related to the tied hash. For example:

  my $scores = $hash{my_scores};
  $scores->{fred} = 215;

would no longer affect the stored data. Instead, we got warnings on the MLDBM manpage to ``not do this''. Also, we still had all the limitations of a standard dbmopen-style database: size limits, multiuser access, and non-portability.

One solution that I resorted to on more than one occasion was to take over the serialization myself, and to use Storable's retrieve and nstore operations directly. My code would look something like:

  use Storable qw(nstore retrieve);
  my $data = retrieve('file');
  ... perform operations with $data ...
  nstore $data, 'file';

Now my $data value could be an arbitrarily complex data structure, and any changes I made would be completely reflected in the updated file. The result was that I simply had a Perl data strucure that persisted.

It appears that the author of Tie::Persistent had the same idea to use Storable on the entire top-level structure as well, except with a tie wrapper instead of explicit fetch-store phases, although I can't vouch for the code. In fact, I see a number of CPAN entries that all seemed to find similar mechanisms, but none of them seemed to have found the ``holy grail'' of object persistence, making it as absolutely transparent as possible in a nice portable (and hopefully multiuser) manner.

That is, until I noticed DBM::Deep. According to the Changelog, this distribution has been around for about two years (as I write this), but only on the CPAN for a few months. From its own description:

DESCRIPTION

A unique flat-file database module, written in pure perl. True multi- level hash/array support (unlike MLDBM, which is faked), hybrid OO / tie() interface, cross-platform FTPable files, and quite fast. Can handle millions of keys and unlimited hash levels without significant slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and Windows.

And with a promotional paragraph like that, I just had to look. It looks simple enough. I merely say:

  use DBM::Deep;
  my $hash = DBM::Deep->new("foo.db");
  $hash->{my_array} = [1..5];
  $hash->{my_scores} = { fred => 205, barney => 195, dino => 30 };

And that's it. In my next program:

  use DBM::Deep;
  my $hash = DBM::Deep->new("foo.db");
  $hash->{my_scores}->{fred} = 215; # update score

And finally, retrieving it all:

  use DBM::Deep;
  my $hash = DBM::Deep->new("foo.db");
  print join(", ",@{$hash->{my_array}}), "\n";
  for (sort keys %{$hash->{my_scores}}) {
    print "$_ => $hash->{my_scores}->{$_}\n";
  }

which prints:

  1, 2, 3, 4, 5
  barney => 195
  dino => 30
  fred => 215

And in fact, that all just plain worked. I'm impressed. We've avoided the MLDBM problem, because the update to the nested data worked. And, there's no dependency on traditional DBMs here, so there's no size limitation or byte ordering, or even the need for a C compiler to install.

I'm told, although I haven't tested it, that I can also add:

  $hash->lock;
  ... do some shared things ...
  $hash->unlock;

and thereby access shared data in multiple processes.

There also seems to be some cool stuff around encrypting or compressing the data as well. This definitely bears further examination.

The limitations of DBM::Deep seem rather expected. Because this is a single data file, it's being locked using flock, so we can't persist data for multiple users across machines or reliably across NFS. Also, we have to clean up after ourselves from time to time by calling an optimize method: otherwise, unused space starts accumulating in the database.

One other recent addition to the CPAN also caught my eye: OOPS. Unlike DBM::Deep, OOPS uses a DBI-style database (currently only compatible with PostgreSQL, MySQL, and SQLite) for its persistent store. However, like DBM::Deep, once a connection is made, you pretty much do anything you want with the data structure, and it gets reflected into the permanent storage. The database tables are created on request, and managed by the module transparently.

The basic mode of OOPS looks like:

  use OOPS;
  transaction(sub {
    OOPS->initial_setup(
      dbi_dsn => 'dbi:SQLite:/tmp/oops',
      username => undef, # no matter with SQLite
      password => undef, # ditto
    ) unless -s "/tmp/oops";
    my $hash = OOPS->new(
      dbi_dsn => 'dbi:SQLite:/tmp/oops',
      username => undef, # no matter with SQLite
      password => undef, # ditto
    );
    $hash->{my_array} = [1..5];
    $hash->{my_scores} = { fred => 205, barney => 195, dino => 30 };
    $hash->{my_scores}->{fred} = 215; # update score
    $hash->commit;
  });

The wrapper of transaction forces this update to all be within a single transaction. We fetch the data similarly:

  use OOPS;
  transaction(sub {
    my $hash = OOPS->new(
      dbi_dsn => 'dbi:SQLite:/tmp/oops',
      username => undef, # no matter with SQLite
      password => undef, # ditto
    );
    print join(", ",@{$hash->{my_array}}), "\n";
    for (sort keys %{$hash->{my_scores}}) {
      print "$_ => $hash->{my_scores}->{$_}\n";
    }
  });

And in fact, this retrieved exactly the values I had expected. I'll be exploring these two modules in greater depth in the future, and until then, enjoy!


Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.