Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 63 (Mar 2006)

[suggested title: ``Inside-out Objects'']

In [my previous article], I created a traditional hash-based Perl object: a Rectangle with two attributes (width and height) using the constructor and accessors like so:

  package Rectangle;
  sub new {
    my $class = shift;
    my %args = @_;
    my $self = {
      width => $args{width} || 0;
      height => $args{height} || 0;
    };
    return bless $self, $class;
  }
  sub width {
    my $self = shift;
    return $self->{width};
  }
  sub set_width {
    my $self = shift;
    $self->{width} = shift;
  }
  sub height {
    my $self = shift;
    return $self->{height};
  }
  sub set_height {
    my $self = shift;
    $self->{height} = shift;
  }

I can construct a 3-by-4 rectangle easily:

  my $r = Rectangle->new(width => 3, height => 4);

At this point, $r is an object of type Rectangle, but it's also simply a hashref. For example, the code in set_width merely deferences a value like $r to gain access to the hash element with a key of width. But does Perl require such code to be located within the Rectangle package? No. As a user of the Rectangle class, I could easily say:

  $r->{width} = 5;

and update the width from 3 to 5. This is ``peering inside the box'', and will lead to fragile code, because we've now exposed the implementation of the object, not just the interface.

For example, suppose we modify the set_width method to ensure that the width is never negative:

  use Carp qw(croak);
  sub set_width {
    my $self = shift;
    my $width = shift;
    croak "$self: width cannot be negative: $width"
      if $width < 0;
    $self->{width} = $width;
  }

If the $width is less than 0, we croak, triggering a fatal exception, but blaming the caller of this method. (We don't blame ourselves, and croak is a great way to pass the blame.)

At this point, we'll trap erroneous settings:

  $r->set_width(-3); # will die

But if someone has broken the box open, we get no fault:

  $r->{width} = -3; # no death

This is bad, because the author of the Rectangle class no longer controls behavior for the objects, because the data implementation has been exposed.

Besides exposing the implementation, another problem is that I have to be careful of typos. Suppose in rewriting the set_width method, I accidentally transposed the last two letters of the hash key:

    $self->{widht} = $width;

This is perfectly legal Perl, and would not throw any compile-time or run-time errors. Even use strict isn't helping here, because I'm not misspelling a variable name: just naming a ``new'' hash key. Without good unit tests and integration tests, I might not even catch this error. Yes, there are some solutions to ensure that a hash's keys come from only a particular set of permitted keys, but these generally slow down the hash access significantly.

We can solve both of these problems at once, without significantly impacting the performance of our programs by using what's come to be known as an inside-out object. First popularized by Damian Conway in the neoclassic Object-Oriented Perl book, an inside-out object creates a series of parallel hashes for the attributes (much like we had to do back in the Perl4 days before we had hashrefs). For example, instead of creating a single object for a rectangle that is 3 by 4:

  my $r = { width => 3, height => 4 };

we can record its attributes in two separate hashes, keyed by some unique string:

  my $r = "some unique string";
  $width{$r} = 3;
  $height{$r} = 4;

Now, to get the height of the rectangle, we use the unique string:

  my $width = $width{$r};

and to update the height, we use that same string:

  $height{$r} = 10;

When we turn on use strict, and declare the %width and %height attribute hashes, this will trap any typos related to attribute names:

  use strict;
  my %width;
  my %height;
  ...
  my $r = "another unique string";
  $height{$r} = 7; # ok
  $widht{$r} = 3; # won't compile!

The typo on the width is now caught, because we don't have a %widht hash. Hooray. That solves the second problem. But how do we solve the first problem, and where do we get this ``unique string'', and how do we get methods on our object?

If I assign a blessed anonymous empty hash to $r:

  my $r = bless {}, "Rectangle";

then when the value of $r is used as a string, I get a nice unique string:

  Rectangle=HASH(0x400180FE)

where the number comes from the hex representation of the internal memory address of the object. As long as this reference is alive, that memory address will not be reused. Aha, there's our unique string:

  sub new_7_by_3 {
    my $self = bless {}, shift;
    $height{$self} = 7;
    $width{$self} = 3;
    return $self;
  }

And this is what our constructor does! By blessing the object, we'll return to the same package for methods. By having an anonymous hashref, we're guaranteed a unique number. And as long as the lexical %height and %width hashes are in scope, we can access and update the attributes.

But what are we returning? Sure, it's a hashref, but it's empty. There's no code that we can use to get from $r to the attribute hashes:

  my $r = Rectangle->new_7_by_3;

The only way we can get the height is to have code in same scope as the definitions of the attribute hashes:

  sub height {
    my $self = shift;
    return $height{$self};
  }

And then we can use that code in our main program:

  my $height = $r->height;

The first parameter is $r, which gets used only for its unique string value, as a key into the lexical %height hash! It all Just Works.

Well, for some meaning of Works. We still have a couple of things to fix. First, there's really no reason to make an anonymous hash, because we never put anything into it, so we might as well make it a scalar:

  my $self = bless \(my $dummy), shift;

Because Perl doesn't have a primitive anonymous scalar constructor, I'm cheating by making a $dummy variable.

Second, we've got some tidying up to do. When a value is no longer being referenced by any variable, we say it goes out of scope. When a traditional hashref based object goes out of scope, any elements of the hash are also discarded, usually causing the values to also go out of scope (unless they are also referenced by some other live value). This all happens quite automatically and efficiently.

However, when our inside-out object goes out of scope, it doesn't ``contain'' anything. However, its address-as-a-string is being used in one or more attribute hashes, and we need to get rid of those to mimic the traditional object mechanism. So, we'll need to add a DESTROY method:

  sub DESTROY {
    my $dead_body = $_[0];
    delete $height{$dead_body};
    delete $width{$dead_body};
    my $super = $dead_body->can("SUPER::DESTROY");
    goto &$super if $super;
  }

Note that after deleting our attributes, we also call any superclass destructor, so that it has a chance to clean up too.

Let's put it all together:

  package Rectangle;
  my %width;
  my %height;
  sub new {
    my $class = shift;
    my %args = @_;
    my $self = bless \(my $dummy), $class;
    $width{$self} = $args{width} || 0;
    $height{$self} = $args{height} || 0;
    return $self;
  }
  sub DESTROY {
    my $dead_body = $_[0];
    delete $height{$dead_body};
    delete $width{$dead_body};
    my $super = $dead_body->can("SUPER::DESTROY");
    goto &$super if $super;
  }
  sub width {
    my $self = shift;
    return $width{$self};
  }
  sub set_width {
    my $self = shift;
    $width{$self} = shift;
  }
  sub height {
    my $self = shift;
    return $height{$self};
  }
  sub set_height {
    my $self = shift;
    $height{$self} = shift;
  }

Not bad! Only slightly more complex than a traditional hashref implementation, and a lot safer for the ``outside''. Of course, this is a lot of code to get right, so the best thing is to let someone else do the hard work. See Class::Std and Object::InsideOut for some budding frameworks to build these objects. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 63 (Mar 2006)