Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 69 (Mar 2005)
[Suggested title: ``Fit to be tied (part 2)'']
Last month, I introduced the tie
operator, illustrating how
tie
can be used with a scalar variable to give additional
behaviors to getting and setting the value of the variable.
Let's continue by looking at other data types that can be tied. Another common thing to tie is a hash:
tie %h, MyTie, @list;
which translates into:
MyTie->TIEHASH(@list);
Again, the constructor is expected to return an object, which becomes
the secret object of the tied variable. Again, we can call tied
to get at that object. The FETCH
method now recieves a parameter,
because we need to know which key is being fetched:
$x = $h{SomeKey};
turns into:
$x = tied(%h)->FETCH("SomeKey");
where storing a value now gets two parameters: the key, and the new value, so:
$h{SomeKey} = "newvalue";
turns into:
tied(%h)->STORE("SomeKey", "newvalue");
It's possible to use the same class for tying both scalars and hashes,
but you'll have to keep track of whether you were called with
TIESCALAR
or TIEHASH
, to know what kind of FETCH
and STORE
to do, which could get messy. Another strategy is to use your
public class as a dispatch to the proper private implementation class:
package MyTie; sub TIESCALAR { my $class = shift; $class->scalar_class->TIESCALAR(@_); } sub scalar_class { "MyTie::Scalar" } sub TIEHASH { my $class = shift; $class->hash_class->TIEHASH(@_); } sub hash_class { "MyTie::Hash" }
package MyTie::Scalar; # normal tied scalar stuff here
package MyTie::Hash; # normal tied hash stuff here
We're using a class method (scalar_class
or hash_class
) to get
the name of the actual class to be used. This lets someone subclass
this class, and provide their own derived associated classnames as
well, simply by overriding these methods as needed.
Let's create a basic tied hash that acts like a normal tied hash,
putting the methods in MyTie::Hash
. First, we'll want to
create a data structure to store the associated keys and values
of our ``inner'' hash. An appropriate structure for that is simply
another hash, so we'll create one as an instance of the inner
hidden object:
package MyTie::Hash; sub TIEHASH { my $class = shift; bless { Value => {}, # our hash value }, $class; }
Now when we call:
tie %h, MyTie::Hash;
we get the hidden object holding an empty hash. To handle a store,
we create a STORE
method, just as with the tied scalar:
package MyTie::Hash; sub STORE { my ($self, $key, $value) = @_; $self->{Value}->{$key} = $value; }
Note that we get three arguments now instead of two, because we have
to know which key will be getting the corresponding value. And
there's also an extra corresponding argument to FETCH
to select
which key we need:
package MyTie::Hash; sub FETCH { my ($self, $key) = @_; return $self->{Value}->{$key}; }
With just these methods, we'd now get a basic functional hash:
tie my %h, MyTie::Hash; $h{"fred"} = "flintstone"; # calls STORE $h{"barney"} = "rubble"; print $h{"barney"}, "\n"; # calls FETCH, producing "rubble"
Hash elements can also be deleted and checked for existence, so:
delete $h{SomeKey}
turns into:
tied(%h)->DELETE("SomeKey")
and
exists $h{SomeKey}
turns into:
tied(%h)->EXISTS("SomeKey")
To implement those in our basic hash, we just add the corresponding methods:
package MyTie::Hash; sub DELETE { my ($self, $key) = @_; return delete $self->{Value}->{$key}; } sub EXISTS { my ($self, $key) = @_; return exists $self->{Value}->{$key}; }
So far, this may not appear to be very interesting. All we've done is created a much slower implementation of a basic hash. So, let's do something that a normal hash can't. Let's make a case-insensitive hash, by forcing all keys to be lowercased before they're used with the inner secret hash:
package MyTie::CaseFoldingHash; sub TIEHASH { # unchanged from before my $class = shift; bless {Value => {}}, $class; } sub STORE { my ($self, $key, $value) = @_; $self->{Value}->{lc $key} = $value; } sub FETCH { my ($self, $key) = @_; return $self->{Value}->{lc $key}; } sub DELETE { my ($self, $key) = @_; return delete $self->{Value}->{lc $key}; } sub EXISTS { my ($self, $key) = @_; return exists $self->{Value}->{lc $key}; }
Note the addition of lc
in front of each key access. By
consistently interpreting the key in this manner, the case no longer
matters.
tie my %lastname, MyTie::CaseFoldingHash; $lastname{"Fred"} = "Flintstone"; # stores at "fred" print $lastname{"FRED"}, "\n"; # fetches "Flintstone"
Both Fred
and FRED
were lowercased to fred
, so these two
accesses connect with the same inner hash element.
Hashes can also be accessed with keys
, values
, and each
. How
are these mapped to the tied hash? It's a bit tricky, so let's break
it down starting with each
.
The first time each
is called on a tied hash, the FIRSTKEY
method is called on the inner object, and expected to return some key
of the hash. Each subsequent call to each
triggers a call to
NEXTKEY
on the inner object, which is expected to return either the
next key. If there are no keys to be delivered, either of these
methods can return undef
.
The keys
and values
methods are implemented as if calling
each
repeatedly until an undef
result is returned, so once we
get FIRSTKEY
and NEXTKEY
implemented, we'll get all three of
each
, keys
, and values
for free, as well as the ``flattening''
of a hash in a list context (@b = %a;
).
For our simple hash, we need a way to return any key of our inner hash
for FIRSTKEY
, and all the remaining keys for NEXTKEY
until we've
iterated through the hash. Well, this is exactly what calling each
on the inner hash will do, so let's take that shortcut:
sub FIRSTKEY { my ($self) = @_; each %{ $self->{Value} }; } sub NEXTKEY { my ($self) = @_; each %{ $self->{Value} }; }
And this works fine for the basics, like:
for my $lastnames (values %lastname) { ... } while (my($k, $v) = each %lastname) { ... } my @flat = %lastname; my $keycount = keys %lastname;
However, we'll run into trouble when we're walking the hash with each
,
but fail to reach the end of the hash before calling one of the other
three operations. For example, presuming more than 4 elements in the
hash, let's start by calling each
three times:
each %lastname; # calls FIRSTKEY each %lastname; # calls NEXTKEY each %lastname; # calls NEXTKEY
NEXTKEY
is now ready to deliver the 4th key, but let's call
keys
instead:
my @keys = keys %lastname;
Perl will call FIRSTKEY
, which gets the fourth key (oops!), and
then call NEXTKEY
repeatedly, getting the remaining keys.
The problem is in how we used each
in our FIRSTKEY
method. We
really need to reset the each
iterator on our inner hash. A quick
way to to do that is to call keys
on that hash in a scalar context:
sub FIRSTKEY { my ($self) = @_; scalar keys %{ $self->{Value} }; # reset iterator each %{ $self->{Value} }; }
And you'll find that this works much better, consistent with normal hashes.
In our case-insensitive hash, calling keys
will now show a series
of lowercased keys. How would we preserve the original case of the
assignment, while still letting accesses of either case connect to the
same element? We'd need to store the original key as well. We could
organize the inner hash with values that in turn are two-element
arrays containing the original key and its corresponding value. Let's
see how that looks and works:
package MyTie::CasePreservingHash; sub STORE { my ($self, $key, $value) = @_; $self->{Value}->{lc $key} = [$key, $value]; } sub FETCH { my ($self, $key) = @_; return $self->{Value}->{lc $key}->[1]; }
So a store creates a two element array, keeping the original key
handy, while a fetch fetches the value regardless of the key's case.
So far, so good. How about DELETE
and EXISTS
? Turns out, they
don't change a bit. Good. But FIRSTKEY
and NEXTKEY
are a bit
trickier. We want to walk the inner hash, but not return its keys
(which are already lowercased). Instead, we need to return element 0
of the array referenced by the value of that hash, like so:
sub FIRSTKEY { my ($self) = @_; scalar keys %{ $self->{Value} }; # reset iterator, as before if (my($k, $v) = each %{$self->{Value}}) { # we have a valid key, so return the unmangled real key return $v->[0]; } else { # hash must be empty return undef; } }
And unfortunately, NEXTKEY
is nearly identical, except for the
absence of resetting the iterator:
sub NEXTKEY { my ($self) = @_; if (my($k, $v) = each %{$self->{Value}}) { # we have a valid key, so return the unmangled real key return $v->[0]; } else { # hash must be empty return undef; } }
That probably means I should refactor:
sub reset_keys { scalar keys %{ shift->{Value} }; } sub next_indirect_key { my $self = shift; if (my($k, $v) = each %{$self->{Value}}) { # we have a valid key, so return the unmangled real key return $v->[0]; } else { # hash must be empty return undef; } } sub FIRSTKEY { my $self = shift; $self->reset_keys; return $self->next_indirect_key; } sub NEXTKEY { my $self = shift; return $self->next_indirect_key; }
There, that feels better. It's probably also easier to subclass.
Using this tie class definition, we can now perform case-insensitive access, but preserving the initially assigned case of the keys:
tie my %lastname, MyTie::CasePreservingHash; # set up values $lastname{"Randal"} = "Schwartz"; $lastname{"Tom"} = "Phoenix"; $lastname{"brian"} = "foy"; # show that the case is preserved: print "first names:\n", map(" $_\n", sort keys %lastname); # show that the access is insensitive: print "Tom's last name is $lastname{'tom'}\n";
And we get:
first names: Randal Tom brian Tom's last name is Phoenix
Aha! It worked. Case is indeed preserved, but the access is case-insensitive.
Other actions for hashes include clearing the hash out and using the
hashname in a scalar context. These map to the CLEAR
and SCALAR
methods, respectively.
And besides tying hashes, you can also tie arrays, and filehandles.
For further information on how to do that, see the perltie
manpage.
Until next time, enjoy!