Thursday, March 31, 2011

Ruby gotcha: Hash default

Ruby lets you specify a default value when you create a Hash. Otherwise, if a key isn't found, it returns nil:
h = {}
=> {}
h['foo']
=> nil
h = Hash.new(1)
=> {}
h['foo']
=> 1

Great! That works fine when the default value is a number, but try a String, and you may see something weird:
h = Hash.new("default")
=> {} # empty hash created
h['foo']
=> "default" # 'foo' has the default value
h['bar'] += " value"
=> "default value" # 'bar' gets the default, plus " value" appended
h['baz'].slice!(0..1) # whoa, wait! what's going on here?
=> "de"
h['foo']
=> "fault" # what happened to foo?
h['bar']
=> "default value" # bar is still ok!
h
=> {"bar"=>"default value"} # but it's the only thing in the hash!

What's going on here? In short, a bunch of stuff that makes sense in retrospect, once you know what's happening under the hood, but isn't very intuitive.

The first thing is that calling h['foo'] doesn't add key 'foo' to the hash with the default value; it just returns the default. It doesn't add a key to the hash unless we actually set a value, like we do for 'bar'.

What about the "h['baz'].slice!(0..1)" line? That's actually modifying the default string in place, not just for baz, but for everyone. It still doesn't set a value for baz. That's a little surprising. If you do slice() instead of slice!(), it doesn't modify anything.
h['baz'].slice(0..1)
=> "de"
h['baz']
=> "default"

To modify baz and only baz, you need to do something like this:
h['baz'] += ""                 # set the value in baz to the default plus an empty string
=> "default"
h['baz'].slice!(0..1) # modify baz in place
=> "de"
h['foo']
=> "default" # foo still the default
h['baz']
=> "fault" # baz modified

Ok, that's a quirky edge case, but what's the big deal? Why am I writing a post about this? Because if set the default to an empty array, it really isn't going to work the way you want:
h = Hash.new([])
=> {}
h['foo']
=> []
h['foo'].push "red"
=> ["red"]
h['bar']
=> ["red"]
h['bar'].push "green"
=> ["red", "green"]

Instead of having an empty array associated with each key, there's one array shared by all keys. (I actually ran across this problem while trying to set a default hash, and it was much less obvious what had gone wrong.)
Now, you can create a Hash that will do what you want, but you need to pass a code block to the constructor:
h = Hash.new {|hash, key| hash[key] = []}
=> {}
h['foo']
=> []
h['foo'].push "red"
=> ["red"]
h['bar'].push "green"
=> ["green"]

So it turns out that this constructor is what you need to use pretty much anytime your default is a container or complex object. That's going to me handy for me to know, at least.

No comments:

Post a Comment