Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 86 (Oct 2006)

[suggested title: ``Wrap your database rows with Rose'']

Maybe I'm unique. Maybe I'm the only programmer in the world who hates (re)typing the same exact boring SQL to update a column in a row of interest:

  UPDATE cruise
  SET start_date = "2006-09-21"
  WHERE cruise.id = 27;

But apparently, I'm not the only person who hates writing code like that, because there are a few very nice ``object relational mapper'' (ORM) framework in the Perl world. The goal of an ORM is to present each row as a single object, and then use traditional getters and setters against that object to update the row, possibly delayed until the object goes out of scope or is explicitly saved. And I'm a big fan of a good ORM, because I'd rather write that update as:

  $cruise->start_date("2006-09-21");

Of course, with every framework or wrapper, there's some overhead introduced. The goal with a good framework is to have an abstraction that mostly hides the fact that these objects are really rows, but gives me enough low-level hints that I am encouraged to write code that maps closely to the real SQL queries that must eventually be executed.

In past writings, I've extolled the virtues of a rather popular ORM called Class::DBI. This framework (often abbreviated CDBI) started out as some interesting object wrapping of rows by Michael Schwern. Eventually, the whole thing got wrapped with two more layers of abstraction by Tony Bowdon, who then propelled CDBI into the ``nearly infamous'' category.

And for a good three or four years, CDBI was indeed the ORM of choice. I wrote articles about it, and other people plugged it in to larger frameworks like Catalyst (formerly Maypole), and still others used it like there was nothing else. Which is a bit odd, because there were other things, such as Tangram and Alzabo, which worked hard to hide all that nasty SQL away from us, letting us use simple Perl objects with a strong will to live.

But then one day (and I'm sorry if I'm mispresenting this, because it's the way I remember it), Tony got a little irritated at being the benevolent dictator for what had become the most popular ORM for Perl. And he quit. Took his toys and went home. No patches accepted, no response to the community. In fact, the next release of CDBI included on the manpage:

: There is an active Class::DBI community, however I am not part of it. I am not on the mailing list, and I don't follow the wiki. I also do not follow Perl Monks or CPAN reviews or annoCPAN or whatever the tool du jour happens to be.

The impression we got was that if there was going to be any updates to CDBI, it'd be exactly and only what Tony wanted of the code, and not because the community wanted of it. And that's certainly his right, as the owner of the code.

However, it left a few of us in an embarassing position. For example, I had used CDBI on a number of commercial projects, and now realizing that I'd get no more support on the code meant that I'd essentially have to fork a version of my own (legal within the Perl license) to keep things working. And what of my clients? There was really no place to go for maintenance on what had become this key component of their application.

Within a few months of Tony's apparent withdrawal, Matt Trout (one of the big users of CDBI) stepped forward to create DBIx::Class (DBIC), which was essentially a reimplementation of most of the things we liked about CDBI, but omitting some of the layering that had made CDBI both slow and hard to maintain.

Initially, I was quite intrigued by the prospect: DBIC seemed to have most of the features I had grown to like in CDBI, and yet without about half the overhead, and with far more ``community'' support than CDBI was now reflecting. However, I realized that part of what I didn't like about CDBI was also being duplicated (necessarily) in DBIC, in terms of how columns were being processed when being stored or updated. The old design was a wart upon a pimple, and the new design had to duplicate that.

So, while I was starting to make plans to migrate my frequent clients from CDBI to DBIC, I wasn't entirely comfortable with the decision, thinking that even though CDBI was no longer supported, at least it was working for the customers for now. And I'm glad I waited, because along came Rose::DB::Object (RDBO).

Now, I'm not sure if John Siracusa (the creator and maintainer of many things in the Rose namespace) went through a similar experience as mine. All I know is that when RDBO first came out, I was thinking ``oh, another ORM, how quaint''. And then I started looking at the benchmarks, and my jaw dropped.

Unlike most other ORMs, which tried to hide the fact that the rows of the database mapped directly into the objects, RDBO takes the opposite approach: that you should know that each object is a database row. And this shows up by making most database operations map to method calls more closely.

For example, you can create an abstract row in memory, but until you call save on that row, it doesn't really exist on the database. With CDBI, I had to cheat a bit with that, hoping that a random memory-only object that I created in the shape of a row wasn't going to get accidentally realized into the database.

And because of many optimizations under the hood, RDBO could run faster and with more immediate control of database actions. The benchmarks with RDBO vs CDBI and DBIC are very convincing: in many cases, RDBO comes within striking distance of hand-crafted DBI calls, beating CDBI sometimes by an order of magnitude for speed!

Another thing I like about RDBO is that John Siracusa is well versed in good design practices for object-oriented programming and large frameworks. I say this having read through a huge number of frameworks in my life, and even having written a few that are now popular. The design for RDBO seems quite elegant, with the right ability to subclass or replace parts with different components, but defaulting to the most common thing to eliminate the need to ``over-specify'' for the straightforward cases.

Another thing that RDBO does ``just enough'' of is try to hide a few of the database differences from me. For example, I recently wrote a pile of code for a client that would eventually run on MySQL. Since I'd never choose MySQL for a client, now that I know about PostgreSQL, I decided that I wanted to do some early testing without MySQL, and chose sqlite instead. Once I got my code running with sqlite, it was a slight matter of changing one or two methods in my code, and I had a working MySQL implementation as well. This even included some date/time fields, which were transparently rewritten for both sqlite and MySQL.

RDBO creates a level of abstraction for both the database, and for rows in the database. The reason these are separate (instead of being bound together, as they are in CDBI), is that you might conceivably have a row in one database that you're moving to another database, or be mixing your application across multiple databases.

For your database, you'll be subclassing Rose::DB. Your rows are subclasses of Rose::DB::Object. My personal convention for a project named ABC is that I'll create these classes:

ABC::DB: subclass of Rose::DB for my database.
ABC::DB::mumble: classes that deal with the entire database.
ABC::DBO: subclass of Rose::DB::Object, parent class of all my rows.
ABC::DBO::Foo: subclass of ABC::DBO for the Foo row.

And because I'm consistent, I'll be able to come back to this application in a year and still know what I did. I hope.

The other thing that RDBO needs besides the database information and the row class is the meta information for the row. The metainformation includes the database columns, and any relationship between this table and other tables, such as a ``one to many'' role.

RDBO can operate in two modes (well, actually two and a half, as I describe in a moment). RDBO can either probe the database for all meta information (this foreign key points at that table's primary key, and thgat column is a date field), or be given the meta information explictly.

If your database is simple enough, probing the database for the meta information ensures that this metainfo is never out of sync with the existing database. However, for a large database, the probing-and-thinking time may exceed your desired amount of startup time.

And that's where the ``other half mode'' comes in. When you decide that you're no longer willing to pay the startup price for dynamically probed metainformation, you can perform the operation one last time and then spit out fairly well formed Perl code that can be loaded to set up the same meta information. You can capture this, and use this as the basis for further refinement. For example, a TEXT field in the database might actually be restricted by business rules to be one of greeen, yellow, or red. A simple edit to the Perl code spit out by the ``half'' stage, and we'll have automatic checking against this list of enumerated values: something that would never have been represented in the schema of the database.

One thing that RDBO doesn't (yet) do that some ORMs take on is a mapping from metainformation to the SQL DDL that would create such a database, which is a feature of ORMs like Alzabo. I've chatted with John recently, and he just hasn't had the need yet, so that's why RDBO doesn't do it, and I understand his reasonings.

Next month, I plan to show some sample code using RDBO with a typical small database, and hopefully this will illustrate more clearly some of the concepts that I've been discussing here. Until then, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 86 (Oct 2006)