MergeThe other day something curious happened that made me question how I use bang methods (and their non-bang cousins). Here is how I normally look at using bang methods in Ruby. Always use the non-bang version, unless you have a really, really good reason not to. It just seems somehow safer this way, like when you’re passing an object (String, Array etc.) around as a parameter, don’t modify the parameter itself unless that is exactly the behaviour you’re looking for. Use methods that don’t modify the original object, but rather work on a copy, and then return the copy to the calling code. Maybe I am being too asinine about this, do come and let me know if I am :). Regardless, I have trained myself to avoid bang methods unless I really need them, but a few days ago I needed to merge some hashes.

You see, I had a file full of smaller hashes (serialized) that I wanted to read in and merge into one big hash. As usual, I tried using the non-bang version of the merge method and it just crawled. Let’s recreate this code:

ruby main_hash = {} time = Benchmark.realtime do (1..10000).each do |number| simple_hash = {} simple_hash[number.to_s] = number main_hash = main_hash.merge(simple_hash) end end puts "Time elapsed #{time} seconds" puts "Main hash key count: #{main_hash.keys.count}"

This produces:

Time elapsed 13.7789120674133 seconds
Main hash key count: 10000

As you can see, it took 13 seconds to merge only 10000 tiny hashes into one. I had significantly more than ten thousand and mine were somewhat larger. The surprising thing was that when you replace the merge method with its bang equivalent, the output produced is:

Time elapsed 28.7179946899414 milliseconds
Main hash key count: 10000

That’s 28 MILLIseconds which is about 500 times faster. Now admittedly, this use case was probably a good candidate for using the bang version of merge from the start, but that is not immediately obvious, and such a major disparity in performance made me question my non-bang convention. Was this the case with all the bang/non-bang pairs of methods in Ruby? Should I perhaps forget about the non-bang methods all together and switch to bang exclusively? Well, the first question is easy to answer, since we can test it out empirically. Let’s pick another bang/non-bang pair and see if we get the same performance disparity. We’ll use Array flatten:

ruby time = Benchmark.realtime do (1..1000).each do |number| string = "string#{number}" array << string inner_array = [] (1..50).each do |inner_number| inner_array << inner_number end array << inner_array array = array.flatten end end puts "Time elapsed #{time} seconds"

When using the non-bang version, the output is:

Time elapsed 5.95429491996765 seconds

But when we switch to the bang version (array.flatten!):

Time elapsed 6.41582012176514 seconds

Hmm, there is almost no difference, infact the non-bang version is a little faster. I tried similar code with String’s reverse bang/non-bang pair and the results were similar to what we got for the array, the performance differences were negligible. What gives?

The good news is that using non-bang methods is fine if, like me, this is your preference. Unless, of course, you’re trying to merge hashes in which case we need to dig deeper still. Let’s hook up a profiler to our original code and see what we get (I’ll cover Ruby profiling in more detail in later post, for now, just bear with me). Here is our new code:

ruby main_hash = {} time = Benchmark.realtime do Profiler__::start_profile (1..10000).each do |number| simple_hash = {} simple_hash[number.to_s] = number main_hash = main_hash.merge(simple_hash) end Profiler__::stop_profile Profiler__::print_profile($stderr) end puts "Time elapsed #{time} seconds" puts "Main hash key count: #{main_hash.keys.count}"

The profiler output looks like this:

%   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 92.95    12.78     12.78    10000     1.28     1.28  Hash#initialize_copy
  4.51    13.40      0.62        1   620.00 13750.00  Range#each
  1.89    13.66      0.26    10000     0.03     1.30  Hash#merge
  0.51    13.73      0.07    10000     0.01     0.01  Fixnum#to_s
  0.15    13.75      0.02    10000     0.00     0.00  Hash#[]=
  0.00    13.75      0.00        1     0.00 13750.00  #toplevel
Time elapsed 14.436450958252 seconds
Main hash key count: 10000

The culprit here is clear – _initializecopy, that’s where we are spending the vast majority of our time. So, what is _initializecopy? It is basically a hook method that Ruby provides which is invoked after an object has been cloned (using dup or clone). When you clone or dup an instance in Ruby, it is usually a shallow copy and so the fields in the cloned instance will still be referencing the same objects as the fields in the original instance. You can override _initializecopy to fix this issue. More info on cloning and initialize_copy can be found here. So we can now intuitively say that when the merge method creates a copy of the hash to work on, it will need to iterate though all the members of the original hash and copy them into the new hash, to make sure they are not still referencing the same object. Only after it has done this will it be able to go through the second hash (_i.e. otherhash) and merge the values from there, into this new copy of our original hash. Of course, as the hash gets bigger from merging more and more keys into it, it takes longer and longer to copy the values across and so we spend more time inside _initializecopy which becomes our bottleneck. On the other hand, if we use the bang version of merge, we never need to clone our hash and this bottleneck does not exist, so everything is fast. Makes sense, right?

It does, the only problem I have is this. Why does the same thing not happen when we try to flatten the array? When we try to profile our array flattening code we get output similar to the following:

%   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 48.73     1.34      1.34     1000     1.34     1.34  Array#flatten
 31.27     2.20      0.86     1001     0.86     4.04  Range#each
 10.18     2.48      0.28    21000     0.01     0.01  Fixnum#to_s
  9.82     2.75      0.27    22000     0.01     0.01  Array#<<
  0.00     2.75      0.00        1     0.00  2750.00  #toplevel
Time elapsed 3.29541611671448 seconds

As you can see _initializecopy is not even in the picture. So does that mean we can clone an array (when using the non-bang version of flatten) without having to then go over all the elements to make sure they are not referencing the same object? I had a quick look at the C code that backs both the Array and the Hash implementation in Ruby, but I don’t know Ruby native code well enough to get a definitive answer without spending an inordinate amount of time on it (*sigh* another thing to go on the list of things to learn :)). If you do know your way around Ruby’s C code, could you please enlighten the rest of us regarding this interesting issue. In the meantime the lesson is this, non-bang methods are fine, but if you’re going to merge a bunch of hashes into one, use the bang version of merge. The other lesson is the fact that C is still as relevant as ever, if you want to know interesting things :). Although, I probably need to go through my copy of K&R before digging into Ruby’s C code – it has been a while. Anyways, that’s a story for another time.

Image by rachel_r