Using Ruby Blocks And Rolling Your Own Iterators

BlocksI recently covered Ruby block basics in my post, More Advanced Ruby Method Arguments – Hashes And Block Basics. I mentioned that blocks are not really method arguments and also covered the two different types of block syntax. Towards the end of that post I promised to examine Ruby blocks more deeply and I am going to try and do that here.

In my opinion there are several interesting things about blocks:

  • is there any other difference between the two different types of block syntax besides the fact that one is predominantly used for single line blocks and the other for multi-line?
  • in what context are blocks normally used and how does it look from the perspective of the methods that take blocks as ‘parameters’?
  • is passing parameters to blocks similar to passing parameters to a method (i.e. is the syntax just as rich, can you have default values etc.) and what scope do those parameters have?
  • can a block be a first class parameter and can you pass it around easily as opposed to just using it as a quasi-parameter like we normally do?

Examining each one of those in turn can give one a reasonably solid understanding of blocks, so that’s what we’re gonna do.

The Two Different Block Notations

As we know there are two different types of block syntax, the curly brace syntax e.g.:

[1,2,3,4,5].each {|i| print "#{i} "}

and the do..end syntax e.g.:

[1,2,3,4,5].each do |i|
  print "#{i} "
end

Aside from looking different and being used for single or multi-line blocks, there is one major difference between the two types of syntax – precedence. As we know all expressions in Ruby return a value, this is no different when we execute an iterator. So if we want to use a value that is returned by an iterator that takes a block (e.g. print it out) we can normally do so. When we use curly brace notation this is not a problem e.g.:

print [1,2,3,4,5].each {|i| i}

this will print out:

12345

as we would expect. But if we try to do the same with the do..end syntax e.g.:

print [1,2,3,4,5].each do |i| i end

we get the following error:

`each': no block given (LocalJumpError)

this is due to the fact that the method call to print has higher precedence over the do..end block syntax so that what we actually have happening is the following:

print([1,2,3,4,5].each) do |i| i end

to get this to work correctly we need to wrap the whole thing in braces e.g.:

print([1,2,3,4,5].each do |i|
  i
end)

It is not a major difference but something to be aware of.

Learn About Blocks By Implementing Our Own Iterator

The best way I have found to learn how blocks and iterators work together is to implement some of your own starting with a really simple case. Lets implement an infinite loop iterator of our own:

def infinite_loop
  while true
    yield
  end
end

We can call it in the following way, notice that we pass a simple block:

infinite_loop {puts 'Looping Infinitely'}

this will print out ‘Looping Infinitely’ forever since the loop inside out iterator method has no exit condition and just keeps yielding to the block. So far so easy, so lets move on to something more complex. Let’s say we need to add another iterator to the array class that can work with the array elements in reverse (I am aware that Array has a method to reverse itself). We could do this by opening up the Array class and putting our new iterator in:

class Array
  def reverse_iterate
    current_index = self.size-1
    while current_index >= 0
      yield self[current_index]
      current_index -= 1
    end
  end
end

We now have an exit condition for the loop in our iterator which means we won’t yield to the block more times than there are values in the array. We can then call our new iterator method in the following fashion:

[2,4,6,8].reverse_iterate { |i| print "#{i} "}

This is all pretty simple, a couple of things to note, the call to yield inside out new method, this is what causes the method to accept a block in the first place. The second thing is the parameter we pass to the yield method. By passing one parameter to yield we cause the block that the iterator takes to also take one parameter. If we passed two values to yield then our block would need to take two parameters as well. So, the above prints out the array because we yield each of the array values to the block (in reverse order) and just output it:

8 6 4 2

But what if we want our new method to not only take a block but also have some sort of default behavior when no block is provided? This is fairly simple to do, we just need to use the Kernel.block_given? method:

class Array
  def reverse_iterate
    if block_given?
      current_index = self.size-1
      while current_index >= 0
        yield self[current_index]
        current_index -= 1
      end
    else
      print self.reverse
    end
  end
end

we can now call our iterator with no block at all:

[2,4,6,8].reverse_iterate

Our default behavior just prints out all the array values in reverse, concatenated together:

8642

The last thing to note is the fact that yield can actually return a value from the block back to the iterator. The value yield returns is the result of the last expression executed in the block. We can take advantage of this to, for example, collect the values that the block returns on every iteration and return a new array with all the new values as the result of the iterator method call, e.g.:

class Array
  def reverse_iterate
    if block_given?
      new_array=[]
      current_index = self.size-1
      while current_index >= 0
        new_array << yield(self[current_index])
        current_index -= 1
      end
    else
      print self.reverse
    end
    new_array
  end
end

If we then call out iterator method with a block that squares all the values it receives:

puts [2,4,6,8].reverse_iterate { |i| i*i}

the result returned from the method call will be an array with the values of our original array but reversed and squared. We can print this out:

64
36
16
4

This can be pretty handy and is incidentally similar to how the more complex Ruby iterator methods (such as map) work.

Block Parameters

Using block parameters is very similar to using method parameters, the rules are pretty much identical. If you’re using Ruby 1.9 you’ve got all the features like default arguments and optional arguments which you can mix and match pretty much any way you like. If you’re on Ruby 1.8, you can still have optional and default arguments but you’re a little more limited in how you can mix and match them. So to give an example, you could create an iterator that yields 4 parameters to the block in the following way:

class Array
  def reverse_iterate
    current_index = self.size-1
    while current_index >= 0
      yield self[current_index], 'Value', current_index, 'Index'
      current_index -= 1
    end
  end
end

You can then call this iterator with a block but rather than using 4 parameters, you can use two with the second being a catch-all (remember the * notation):

[2,4,6,8].reverse_iterate do |value, *others|
  puts "#{others[0]} = #{value}, #{others[2]} = #{others[1]}"
end

You use the values from the catch-all parameter just like you would an array, so our iterator call would output:

Value = 8, Index = 3
Value = 6, Index = 2
Value = 4, Index = 1
Value = 2, Index = 0

Fairly straight forward. What is more interesting is the scope that variables have once we are inside the block. In the simplest case we may call an iterator with a block from another method, which may already have some variables defined:

def some_crazy_method
  random_variable=5
  [1,2,3].each do |i|
    puts "Array value=#{i}, Random variable=#{random_variable}"
  end
end

It is curious that in this case we actually have access to these variables from within the block. Calling the above method prints out the following:

Array value=1, Random variable=5
Array value=2, Random variable=5
Array value=3, Random variable=5

As you can see we are able to print out random_variable from within the block and it is set to the same value as it was outside the block. But, what happens when we have a variable outside the block and use a variable with the same name as a block parameter e.g.:

def some_crazy_method
  i=5
  puts "Before block i=#{i}"
 
  [1,2,3].each do |i|
    puts "In block i=#{i}"
  end
 
  puts "After block i=#{i}"
end

The behavior here depends on whether you’re using Ruby 1.8 or 1.9. With Ruby 1.8 it is still the same variable as the one defined outside the scope of the block, so if we assign a new value to that variable within the block this value will be available outside of the block. So calling the above method in Ruby 1.8 would produce:

Before block i=5
In block i=1
In block i=2
In block i=3
After block i=3

Notice that i now retains the last value it had within the block. In Ruby 1.9 however, the variable within the block is not the same as the one outside of the block so the output in 1.9 would be:

Before block i=5
In block i=1
In block i=2
In block i=3
After block i=5

As you can see the variable retains it’s old value after the block completes. Ruby 1.9 also provides another feature to do with variable scope. If you want to pass a variable to the block that is not automatically assigned to, but you don’t want to accidentally pass in a variable which has already been initialized (prior to the block and is therefore in scope within the block) you can do the following:

def some_crazy_method
  some_variable=5
 
  puts "Before block some_variable=#{some_variable}"
 
  [1,2,3].each do |i;some_variable|
    puts "In block i=#{i}"
    some_variable = i
  end
 
  puts "After block some_variable=#{some_variable}"
end

By passing a variable to the block after the semicolon, we essentially say that we want to have a variable with that name local to the block and unrelated to a variable with the same name outside the block e.g.:

Before block some_variable=5
In block i=1
In block i=2
In block i=3
After block some_variable=5

This seems to be of limited utility and is not available in Ruby 1.8 (it will produce a compilation error).

Block As Closures

That’s right, you can use a block as a closure and pass it around and call it whenever you like. It is a reasonably advanced feature, but essentially you can do the following:

def method_with_block_as_closure(&block)
  another_method block
end
 
def another_method(variable)
  x=25
  variable.call x
end
 
method_with_block_as_closure {|i| print "I am happy block #{i}"}

By prepending the method parameter with the & symbol you tell the method to treat the block it gets as a closure and assign it to that parameter. We can then pass this parameter around to other methods just like we would any other. When we actually want to call the block we simply need to use the call method on the variable that currently contains the block, passing in any parameters that the block expects. In our case the result of the above code would be to execute the print method that was called inside the block e.g.:

I am happy block 25

This is a very basic, surface view of how to use blocks as first class parameters (closures). At this stage, this is really all we need to know, but I do plan to dig more deeply into how and why this works in a later post (i.e. Procs etc.). If you’re interested in that, then make sure you grab my feed so you don’t miss it. That’s all I wanted to cover regarding blocks, hope you found it interesting.

Image by Holger Zscheyge

  • http://[email protected] Mark Wilden

    “to get this to work correctly we need to wrap the whole thing in braces” should be “parentheses”. But good explanation – I’ll never forget the difference between {} and do/end again.

    • http://www.skorks.com Alan Skorkin

      I always get those two mixed up :). It’s usually ok since most people can contextually get what you mean, but it does sometimes make for a confusing few minutes of conversation :).

  • dan

    Um. Squared, not doubled, Right?

    If we then call out iterator method with a block that doubles all the values it receives:

    puts [2,4,6,8].reverse_iterate { |i| i*i}
    the result returned from the method call will be an array with the values of our original array but reversed and doubled. We can print this out:

    64
    36
    16
    4

    • http://www.skorks.com Alan Skorkin

      Oops, yeah you’re of course correct, it should be squares, nice pick-up. Thanks for letting me know, I will update.

  • dan

    “By passing a variable to the block after the semicolon, we essentially say that we want to have a variable with that name local to the block and unrelated to a variable with the same name outside the block e.g.:”

    Isn’t this the standard 1.9 behaviour (with either a comma or a semi-colon)?

    • http://www.skorks.com Alan Skorkin

      The distinction is pretty fine and as I said the semicolon feature seems to be pretty useless, so perhaps my example was a little unclear about this one.

      Here is the way I think about it. A block can have arguments, when you call an iterator and give it a block, the iterator will automatically assign values to these arguments. The block also has access to variables in the scope it was defined without having to pass them in that don’t conflict in name with the block arguments (in 1.9). So, if you’re using a coma you would be expecting the iterator to assign some values to the arguments that you pass in.

      When you use the semicolon, you’re essentially creating a variable with block scope. It will not be automatically assigned to and it will not have a value set even if it has the same name as a variable in scope when the block was defined. So, you will essentially be shadowing a variable with the same name that is already in scope or creating a whole new variable with block scope.

      The only place that I can see where this is useful would be if you coded an iterator that expects a block with a certain number of parameters and you want to pass in more parameters (that you would like to use within the block), you could do this with the semicolon, but you could not do this with the coma.

      What this really means is if you want to have a variable with a particular name inside the block which is distinct from the variables that the iterator will automatically assign to (i.e. you really like the name ‘foobar’ but you already have a ‘foobar’ variable in scope when you define the block) then you will be able to write the block pass in whatever variables the block already expects and then pass in your ‘foobar’ variable which will shadow the one already in scope and thereby protect it from being modified within your block.

  • Pingback: Functional Programming in Ruby | Yellow™ Lab Blog

  • Pingback: Webs Developer » Closures – A Simple Explanation (Using Ruby)

  • http://blog.mostof.it/ Ochronus

    Thanks for the post! I’ve also written an article aboutRuby blocks and closures with code examples.

  • http://offirmo.posterous.com/ Offirmo

    Man, you’re good !

  • Pingback: Procs and lambdas « Adam Jonas

  • Pingback: Ruby Proc: How to Use the Ruby Proc to Make Your Programs Better