Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 89 (Jan 2007)

[suggested title: ``Rose::DB::Object's many-to-many mappings'']

In [the last three columns], I introduced Rose::DB::Object (RDBO), set up a sample database, and demonstrated the basic Create/Retrieve/Update/Delete (CRUD) operations on a single table. This time, let's look at multiple table operations.

We'll start by defining my favorite film trilogy, the Star Wars classics:

  my $ep4 = My::RDBO::Film->new
    (title => 'Star Wars',
     release_date => '1977-05-25',
    )->save;

  my $ep5 = My::RDBO::Film->new
    (title => 'The Empire Strikes Back',
     release_date => '1980-05-21',
    )->save;

  my $ep6 = My::RDBO::Film->new
    (title => 'Return of the Jedi',
     release_date => '1983-05-25',
    )->save;

Again, to dump these, we'll need some dumping query. Modeling the code after the person-dumping snippet from last month's article, we get the following:

  my $iter = My::RDBO::Film::Manager->get_films_iterator
    (sort_by => 'release_date',
    );
  while (my $film = $iter->next) {
    printf "%s [%d] released on %s\n",
      $film->title, $film->id, $film->release_date;
  }
  print "total films: ", $iter->total, "\n";

And on an empty database, the result is as expected:

  Star Wars [1] released on 1977-05-25T00:00:00
  The Empire Strikes Back [2] released on 1980-05-21T00:00:00
  Return of the Jedi [3] released on 1983-05-25T00:00:00
  total films: 3

Let's put Mark Hamill in the database:

  my $mark = My::RDBO::Person->new
    (name => 'Mark Hamill',
     birthdate => '1951-09-25',
    )->save;

And now it's time to create some join records. We have the role table, which needs four things:

The person
The film
The category of relationship: we'll use ``Actor'' for actors and actresses
The detail: we'll let the detail for an actor be the character name

Items 1 and 2 are foreign keys. When we add a foreign key item, we can specify any of three things:

A reference to a row of the proper class
A hashref containing key/value pairs of columns
A single scalar primary key value for a row of the proper class.

Since we have Mark's row already, together with the film rows, we can add those directly:

  for my $film ($ep4, $ep5, $ep6) {
    My::RDBO::Role->new
        (person => $mark,
         film => $film,
         category => 'actor',
         detail => 'Luke Skywalker',
        )->save;
  }

And we can display them in a similar manner:

  my $iter = My::RDBO::Role::Manager->get_roles_iterator
    (sort_by => 'id',
    );
  while (my $role = $iter->next) {
    my $person = $role->person;
    my $film = $role->film;
    printf
      ("[%d] %s had role %s (%s) on film %s\n",
       $role->id,
       $person->name,
       $role->category,
       $role->detail,
       $film->title,
      );
  }
  print "total roles: ", $iter->total, "\n";

This results in the expected output:

  [1] Mark Hamill had role actor (Luke Skywalker) on film Star Wars
  [2] Mark Hamill had role actor (Luke Skywalker) on film The Empire Strikes Back
  [3] Mark Hamill had role actor (Luke Skywalker) on film Return of the Jedi
  total roles: 3

As we were executing this query, the resulting database queries first selected all role rows, but then had to go back to the database for each individual person and film details. We can see this by adding the appropriate DBI trace calls.

However, we can give a hint to RDBO that we'll also be wanting the person and film records by changing the iterator a bit:

  my $iter = My::RDBO::Role::Manager->get_roles_iterator
    (sort_by => 'id',
     require_objects => [qw(person film)],
    );

The declaration require_objects creates an inner join, such that each role automatically comes with the corresponding data for the correlated row objects. The resulting query looks like:

  SELECT 
    t1.category AS t1_category,
    t1.detail AS t1_detail,
    t1.film_id AS t1_film_id,
    t1.id AS t1_id,
    t1.person_id AS t1_person_id,
    t2.birthdate AS t2_birthdate,
    t2.id AS t2_id,
    t2.name AS t2_name,
    t3.id AS t3_id,
    t3.release_date AS t3_release_date,
    t3.title AS t3_title
  FROM
    roles t1,
    persons t2,
    films t3
  WHERE
    t1.person_id = t2.id AND
    t1.film_id = t3.id
  ...

Because the data is already pulled at the time of the major query, we don't have to go back to the database again. However, we do this at the expense of perhaps pulling redundant data. For example, the information for Mark Hamill is received three times here. The nice thing about RDBO is that we can choose whether to optimize the number of queries or optimize the volume of data that is returned.

We can also ask for just Mark's roles directly:

  for my $role ($mark->roles) {
    my $person = $role->person;
    my $film = $role->film;
    printf
      ("[%d] %s had role %s (%s) on film %s\n",
       $role->id,
       $person->name,
       $role->category,
       $role->detail,
       $film->title,
      );
  }

Every one-to-many relationship can be queried like this, or even set directly.

A foreign key item accessor can also be passed a hashref of key value pairs. In this case, the corresponding row is either located in the database, or if absent, automatically created.

For example, we can add Carrie Fisher to our three films by expressing her record directly in the addition:

  My::RDBO::Role->new
    (person => { name => 'Carrie Fisher' },
     film => $ep4,
     category => 'actor',
     detail => 'Princess Leia Organa',
    )->save;

And this seems to have the right effect:

  [1] Mark Hamill had role actor (Luke Skywalker) on film Star Wars
  [2] Mark Hamill had role actor (Luke Skywalker) on film The Empire Strikes Back
  [3] Mark Hamill had role actor (Luke Skywalker) on film Return of the Jedi
  [4] Carrie Fisher had role actor (Princess Leia Organa) on film Star Wars
  total roles: 4

However, because we're not looking at a primary key, we should be careful not to accidentally add duplicate records. For example, at first glance, this appears to have done the right thing:

  for my $film ($ep4, $ep5, $ep6) {
    My::RDBO::Role->new
        (person => { name => 'Carrie Fisher' },
         film => $film,
         category => 'actor',
         detail => 'Princess Leia Organa',
        )->save;
  }

But when we dump the ID of the person record, we can see that we've started the Battle of the Clone Wars in the wrong era:

  while (my $role = $iter->next) {
    my $person = $role->person;
    my $film = $role->film;
    printf
      ("[%d] %s[%d] had role %s (%s) on film %s\n",
       $role->id,
       $person->name,
       $person->id,
       $role->category,
       $role->detail,
       $film->title,
      );
  }

which reveals:

  [1] Mark Hamill[1] had role actor (Luke Skywalker) on film Star Wars
  [2] Mark Hamill[1] had role actor (Luke Skywalker) on film The Empire Strikes Back
  [3] Mark Hamill[1] had role actor (Luke Skywalker) on film Return of the Jedi
  [4] Carrie Fisher[2] had role actor (Princess Leia Organa) on film Star Wars
  [5] Carrie Fisher[3] had role actor (Princess Leia Organa) on film The Empire Strikes Back
  [6] Carrie Fisher[4] had role actor (Princess Leia Organa) on film Return of the Jedi

Oops. Too many Fishers. Instead of adding roles that point at new instances of Carrie, we can instead add roles to a single Carrie Fisher:

  my $carrie = My::RDBO::Person->new(name => 'Carrie Fisher')->save;

  for my $film ($ep4, $ep5, $ep6) {
    $carrie->add_roles
      ({film => $film,
        category => 'actor',
        detail => 'Princess Leia Organa',
       });
    $carrie->save;
  }

Here, we're using the automatic add_MUMBLE methods created for every one-to-many mapping method. The role is being created from the hashref, and the whole thing reads rather sensibly.

For a many-to-many mapping (where the middle join table has no additional non-key columns), it's even a bit easier.

For example, let's note that all three films were produced by Lucasfilm Ltd., but Star Wars also had 20 Century Fox as a partner:

  my $fox = My::RDBO::Studio->new(name => '20 Century Fox')->save;
  my $lucas = My::RDBO::Studio->new(name => 'Lucasfilm Ltd.')->save;

  $ep4->add_studios($lucas, $fox);
  $ep4->save;
  $ep5->add_studios($lucas);
  $ep5->save;
  $ep6->add_studios($lucas);
  $ep6->save;

  for my $studio ($lucas, $fox) {
    for my $film ($studio->films) {
      print $studio->name, " released ",
        $film->title, " on ", $film->release_date, "\n";
    }
  }

The ``save'' steps are necessary because without them, we're modifying only the in-memory version. This gives us the option of performing many changes before commiting to a new world view in the database. In fact, within a transaction environment, a save is automatically a single transaction.

Note that unlike the Carrie Fisher cloning example, the studio is uniquely keyed on the name. So we can use the name as a primary key, and shorten that example to:

  $ep4->add_studios('20 Century Fox','Lucasfilm Ltd.');
  $ep4->save;
  $ep5->add_studios('Lucasfilm Ltd.');
  $ep5->save;
  $ep6->add_studios('Lucasfilm Ltd.');
  $ep6->save;

  for my $film ($ep4, $ep5, $ep6) {
    for my $studio ($film->studios) {
      print $studio->name, " released ",
        $film->title, " on ", $film->release_date, "\n";
    }
  }

which results in:

  20 Century Fox released Star Wars on 1977-05-25T00:00:00
  Lucasfilm Ltd. released Star Wars on 1977-05-25T00:00:00
  Lucasfilm Ltd. released The Empire Strikes Back on 1980-05-21T00:00:00
  Lucasfilm Ltd. released Return of the Jedi on 1983-05-25T00:00:00

Note that in this case, the later uses of Lucasfilm Ltd. were considered identical, and reuse the same database row rather than creating new rows.

This concludes my brief tour of the capabilities of Rose::DB::Object. For more information, be sure to check out the Rose::DB::Object::Tutorial manpage. The mailing list listed there is an excellent resource, as well as the IRC channel that some of us monitor. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 89 (Jan 2007)