Copyright Notice

This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Linux Magazine Column 69 (Mar 2005)

[Suggested title: ``Fit to be tied (part 2)'']

Last month, I introduced the tie operator, illustrating how tie can be used with a scalar variable to give additional behaviors to getting and setting the value of the variable.

Let's continue by looking at other data types that can be tied. Another common thing to tie is a hash:

  tie %h, MyTie, @list;

which translates into:

  MyTie->TIEHASH(@list);

Again, the constructor is expected to return an object, which becomes the secret object of the tied variable. Again, we can call tied to get at that object. The FETCH method now recieves a parameter, because we need to know which key is being fetched:

  $x = $h{SomeKey};

turns into:

  $x = tied(%h)->FETCH("SomeKey");

where storing a value now gets two parameters: the key, and the new value, so:

  $h{SomeKey} = "newvalue";

turns into:

  tied(%h)->STORE("SomeKey", "newvalue");

It's possible to use the same class for tying both scalars and hashes, but you'll have to keep track of whether you were called with TIESCALAR or TIEHASH, to know what kind of FETCH and STORE to do, which could get messy. Another strategy is to use your public class as a dispatch to the proper private implementation class:

  package MyTie;
  sub TIESCALAR {
    my $class = shift;
    $class->scalar_class->TIESCALAR(@_);
  }
  sub scalar_class { "MyTie::Scalar" }
  sub TIEHASH {
    my $class = shift;
    $class->hash_class->TIEHASH(@_);
  }
  sub hash_class { "MyTie::Hash" }

  package MyTie::Scalar;
  # normal tied scalar stuff here

  package MyTie::Hash;
  # normal tied hash stuff here

We're using a class method (scalar_class or hash_class) to get the name of the actual class to be used. This lets someone subclass this class, and provide their own derived associated classnames as well, simply by overriding these methods as needed.

Let's create a basic tied hash that acts like a normal tied hash, putting the methods in MyTie::Hash. First, we'll want to create a data structure to store the associated keys and values of our ``inner'' hash. An appropriate structure for that is simply another hash, so we'll create one as an instance of the inner hidden object:

  package MyTie::Hash;
  sub TIEHASH {
    my $class = shift;
    bless {
      Value => {}, # our hash value
    }, $class;
  }

Now when we call:

  tie %h, MyTie::Hash;

we get the hidden object holding an empty hash. To handle a store, we create a STORE method, just as with the tied scalar:

  package MyTie::Hash;
  sub STORE {
    my ($self, $key, $value) = @_;
    $self->{Value}->{$key} = $value;
  }

Note that we get three arguments now instead of two, because we have to know which key will be getting the corresponding value. And there's also an extra corresponding argument to FETCH to select which key we need:

  package MyTie::Hash;
  sub FETCH {
    my ($self, $key) = @_;
    return $self->{Value}->{$key};
  }

With just these methods, we'd now get a basic functional hash:

  tie my %h, MyTie::Hash;
  $h{"fred"} = "flintstone"; # calls STORE
  $h{"barney"} = "rubble";
  print $h{"barney"}, "\n"; # calls FETCH, producing "rubble"

Hash elements can also be deleted and checked for existence, so:

  delete $h{SomeKey}

turns into:

  tied(%h)->DELETE("SomeKey")

and

  exists $h{SomeKey}

turns into:

  tied(%h)->EXISTS("SomeKey")

To implement those in our basic hash, we just add the corresponding methods:

  package MyTie::Hash;
  sub DELETE {
    my ($self, $key) = @_;
    return delete $self->{Value}->{$key};
  }
  sub EXISTS {
    my ($self, $key) = @_;
    return exists $self->{Value}->{$key};
  }

So far, this may not appear to be very interesting. All we've done is created a much slower implementation of a basic hash. So, let's do something that a normal hash can't. Let's make a case-insensitive hash, by forcing all keys to be lowercased before they're used with the inner secret hash:

  package MyTie::CaseFoldingHash;
  sub TIEHASH { # unchanged from before
    my $class = shift;
    bless {Value => {}}, $class;
  }
  sub STORE {
    my ($self, $key, $value) = @_;
    $self->{Value}->{lc $key} = $value;
  }
  sub FETCH {
    my ($self, $key) = @_;
    return $self->{Value}->{lc $key};
  }
  sub DELETE {
    my ($self, $key) = @_;
    return delete $self->{Value}->{lc $key};
  }
  sub EXISTS {
    my ($self, $key) = @_;
    return exists $self->{Value}->{lc $key};
  }

Note the addition of lc in front of each key access. By consistently interpreting the key in this manner, the case no longer matters.

  tie my %lastname, MyTie::CaseFoldingHash;
  $lastname{"Fred"} = "Flintstone"; # stores at "fred"
  print $lastname{"FRED"}, "\n"; # fetches "Flintstone"

Both Fred and FRED were lowercased to fred, so these two accesses connect with the same inner hash element.

Hashes can also be accessed with keys, values, and each. How are these mapped to the tied hash? It's a bit tricky, so let's break it down starting with each.

The first time each is called on a tied hash, the FIRSTKEY method is called on the inner object, and expected to return some key of the hash. Each subsequent call to each triggers a call to NEXTKEY on the inner object, which is expected to return either the next key. If there are no keys to be delivered, either of these methods can return undef.

The keys and values methods are implemented as if calling each repeatedly until an undef result is returned, so once we get FIRSTKEY and NEXTKEY implemented, we'll get all three of each, keys, and values for free, as well as the ``flattening'' of a hash in a list context (@b = %a;).

For our simple hash, we need a way to return any key of our inner hash for FIRSTKEY, and all the remaining keys for NEXTKEY until we've iterated through the hash. Well, this is exactly what calling each on the inner hash will do, so let's take that shortcut:

  sub FIRSTKEY {
    my ($self) = @_;
    each %{ $self->{Value} };
  }
  sub NEXTKEY {
    my ($self) = @_;
    each %{ $self->{Value} };
  }

And this works fine for the basics, like:

  for my $lastnames (values %lastname) { ... }
  while (my($k, $v) = each %lastname) { ... }
  my @flat = %lastname;
  my $keycount = keys %lastname;

However, we'll run into trouble when we're walking the hash with each, but fail to reach the end of the hash before calling one of the other three operations. For example, presuming more than 4 elements in the hash, let's start by calling each three times:

  each %lastname; # calls FIRSTKEY
  each %lastname; # calls NEXTKEY
  each %lastname; # calls NEXTKEY

NEXTKEY is now ready to deliver the 4th key, but let's call keys instead:

  my @keys = keys %lastname;

Perl will call FIRSTKEY, which gets the fourth key (oops!), and then call NEXTKEY repeatedly, getting the remaining keys.

The problem is in how we used each in our FIRSTKEY method. We really need to reset the each iterator on our inner hash. A quick way to to do that is to call keys on that hash in a scalar context:

  sub FIRSTKEY {
    my ($self) = @_;
    scalar keys %{ $self->{Value} }; # reset iterator
    each %{ $self->{Value} };
  }

And you'll find that this works much better, consistent with normal hashes.

In our case-insensitive hash, calling keys will now show a series of lowercased keys. How would we preserve the original case of the assignment, while still letting accesses of either case connect to the same element? We'd need to store the original key as well. We could organize the inner hash with values that in turn are two-element arrays containing the original key and its corresponding value. Let's see how that looks and works:

  package MyTie::CasePreservingHash;
  sub STORE {
    my ($self, $key, $value) = @_;
    $self->{Value}->{lc $key} = [$key, $value];
  }
  sub FETCH {
    my ($self, $key) = @_;
    return $self->{Value}->{lc $key}->[1];
  }

So a store creates a two element array, keeping the original key handy, while a fetch fetches the value regardless of the key's case. So far, so good. How about DELETE and EXISTS? Turns out, they don't change a bit. Good. But FIRSTKEY and NEXTKEY are a bit trickier. We want to walk the inner hash, but not return its keys (which are already lowercased). Instead, we need to return element 0 of the array referenced by the value of that hash, like so:

  sub FIRSTKEY {
    my ($self) = @_;
    scalar keys %{ $self->{Value} }; # reset iterator, as before
    if (my($k, $v) = each %{$self->{Value}}) {
      # we have a valid key, so return the unmangled real key
      return $v->[0];
    } else { # hash must be empty
      return undef;
    }
  }

And unfortunately, NEXTKEY is nearly identical, except for the absence of resetting the iterator:

  sub NEXTKEY {
    my ($self) = @_;
    if (my($k, $v) = each %{$self->{Value}}) {
      # we have a valid key, so return the unmangled real key
      return $v->[0];
    } else { # hash must be empty
      return undef;
    }
  }

That probably means I should refactor:

  sub reset_keys {
    scalar keys %{ shift->{Value} };
  }
  sub next_indirect_key {
    my $self = shift;
    if (my($k, $v) = each %{$self->{Value}}) {
      # we have a valid key, so return the unmangled real key
      return $v->[0];
    } else { # hash must be empty
      return undef;
    }
  }
  sub FIRSTKEY {
    my $self = shift;
    $self->reset_keys;
    return $self->next_indirect_key;
  }
  sub NEXTKEY {
    my $self = shift;
    return $self->next_indirect_key;
  }

There, that feels better. It's probably also easier to subclass.

Using this tie class definition, we can now perform case-insensitive access, but preserving the initially assigned case of the keys:

  tie my %lastname, MyTie::CasePreservingHash;
  # set up values
  $lastname{"Randal"} = "Schwartz";
  $lastname{"Tom"} = "Phoenix";
  $lastname{"brian"} = "foy";
  # show that the case is preserved:
  print "first names:\n",
    map("  $_\n", sort keys %lastname);
  # show that the access is insensitive:
  print "Tom's last name is $lastname{'tom'}\n";

And we get:

  first names:
    Randal
    Tom
    brian
  Tom's last name is Phoenix

Aha! It worked. Case is indeed preserved, but the access is case-insensitive.

Other actions for hashes include clearing the hash out and using the hashname in a scalar context. These map to the CLEAR and SCALAR methods, respectively.

And besides tying hashes, you can also tie arrays, and filehandles. For further information on how to do that, see the perltie manpage. Until next time, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Linux Magazine Column 69 (Mar 2005)