How To Write A Name Generator (In Ruby)

I love reading fantasy, I’ve even written about some of my favourite fantasy series on this blog. One of the things that I have always found interesting about fantasy literature (besides unworkable economies and unsustainable population densities – I tend to over-analyse when I read :)) was how they come up with the names for all the characters. Large fantasy series often contain hundreds of characters – that’s a lot of names. This line of though naturally led me to think of what I would do if I ever needed to make up a bunch of names and being the software developer that I am the answer was naturally – get my computer to make up the names for me.

If you do a search around the web for name generators you get quite a few results, unfortunately most of those don’t tell you how they do what they do and even that is besides the point since I wasn’t really happy with the results that most of these name generators produce. Either the results are way too random (how about 6 consonants in a row) or they are not random enough with clear traces of human intervention (i.e. choosing from a list of pre-made names). Then I found Chris Pounds excellent name generator page. One of the things that he has on this page is his language confluxer (lc) script so for my first attempt at writing a name generator I decided to basically take his script and clean it up a little bit. There were two reasons for this:

  • he uses a pretty clever algorithm for his name generator, it is completely data driven and is therefore able to avoid the 6 consonants/vowels in a row issue while producing output that sounds similar to the data it is based on
  • it was a yucky Perl script and nobody wants to work with that (except Perl programmers), so I felt it was my duty to make it a little bit nicer and since I’ve been playing around with Ruby lately, well you get the picture :)

The Name Generator Algorithm

As I said the script is completely data driven in that it takes a list of words (names in our case) as input and uses these to produce a bunch of randomised names that hopefully sound similar to the original input. It does the following:

  • produces a list of starting letter pairs from the input data (all our names will start with one of these pairs)
  • produces a map of which letters can follow which other letters based on the input data
  • generates words/names by randomly selecting a starting pair and then appending to the word by randomly choosing a letter from the map based on what the last letter in our new word currently is
  • this continues until the word length falls into a particular range (this range is hard-coded in the script)

There are a few more little twists that make this whole thing function but that is the essence of the algorithm.

Faithful Perl-to-Ruby Conversion

So first thing I did was to take the Perl script and do a direct conversion into Ruby, here is what I got:

require 'getoptlong'
 
data_file = 'data.txt'
words_to_generate = 10
 
min_length = 3
max_length = 9
 
opts = GetoptLong.new(
  ["--datafile", "-d", GetoptLong::OPTIONAL_ARGUMENT],
  ["--number-of-words", "-n", GetoptLong::OPTIONAL_ARGUMENT]
)
 
opts.each do |opt, arg|
  case opt
  when '--datafile'
    data_file = arg
  when '--number-of-words'
    words_to_generate = arg
  end
end
 
start_pairs = []
follower_letters = Hash.new('')
 
File.open(data_file, 'r') do |file|
  chars = file.read.chomp.downcase.gsub(/\s/, ' ').chars.to_a
  chars.push(chars[0], chars[1])
  (chars.length-2).times do |i|
    if chars[i] =~ /\s/
      start_pairs.push(chars[i+1, 2].join)
    end
    follower_letters[chars[i, 2].join]=follower_letters[chars[i,2].join]+chars[i+2,1].join
  end
end
 
def generate_word(word, follower_letters, min_length)
  last_pair = word[-2, 2]
  letter = follower_letters[last_pair].slice(rand(follower_letters[last_pair].length), 1)
  if word =~ /\s$/
    return word unless word.length <= min_length
    return generate_word(word[-1, 1]+letter, follower_letters, min_length)
  else
    word = word.gsub(/^\s/, '')
    return generate_word(word+letter, follower_letters, min_length)
  end
end
 
words_to_generate.times do |i|
  puts generate_word(start_pairs[rand start_pairs.length], follower_letters, min_length)[0, max_length].capitalize
end

At this point I had a bit of a shock at how eerily similar the Ruby version of the script looks compared to the Perl version (*shudders*). Anyways, you can just take the above script put it into a file and run it, you’ll need to give it a data file (here is the one I used).

Cleaning Up The Basic Name Generator

The problems with the above script are:

  • it is not self-documenting
  • it is hard to test
  • it is hard to extend

Anyways, I decided to make it a little bit nicer and easier to play around with by breaking it up into a couple of classes (in the interest of object orientation and stuff):

  • name_generator_main.rb – the script entry point
  • NameGenerator – concerned with name generation (as you might expect)
  • DataHandler – concerned with reading the input data and producing the maps and arrays on which the NameGenerator relies
  • ArgumentParser – concerned with dealing with the command line arguments

You can download all of it here.

Now the main script looks much cleaner and you know exactly what’s happening just by reading it:

require 'argument_parser'
require 'data_handler'
require 'name_generator'
 
argument_parser = ArgumentParser.new
argument_parser.parse_arguments
data_handler = DataHandler.new
data_handler.read_data_file(argument_parser.data_file)
name_generator = NameGenerator.new(data_handler.follower_letters)
names = name_generator.generate_names(argument_parser.words_to_generate, data_handler.start_pairs)
names.each {|name| puts name}

This produces output similar to the following:

Jamin
Luce
Jevon
Fredy
Hamilinis
Emmano
Shamarcul
Gagaedric
Jary
Raelis

That’s pretty damn good for a random name generator. The best part of it, since it is completely data driven, if you change the input data you completely alter the output. So if you pass in a file with a bunch of French names, you will get French-sounding random names etc. Try it yourself!

  • http://blog.barrkel.com/ Barry Kelly

    In case any of your other readers are interested, this technique is based on Markov chains. It can be extended beyond the pairs you use to triples or larger tuples, and can also be applied on a word basis for generating text, e.g. nonsense post-modern, Biblical, Shakespearean or Dickensian style prose.

    • http://www.skorks.com Alan Skorkin

      Thanks, I might look into that myself.

  • Ryan Stout

    Looks good, though I can’t get the sample code to work in ruby 1.8. You have a few spots with default arguments, then no default arguments for the last argument. If you can, let me know what I need to do to get it running.

    Thanks.

    • http://www.skorks.com Alan Skorkin

      You’re right, I tried it out on ruby 1.8 and it died. Apparently with ruby 1.8 arguments without default values need to come before arguments with default values (as far as method parameters are concerned). With ruby 1.9 this doesn’t seem to matter. So there we go, live and learn :).

      I’ve now updated the code, if you download and try to run it now it should all work. Thanks for pointing out the issue.

  • http://twitter.com/ehoque ehsanul

    Totally awesome, that output looks great! Seeing a lot about Markov chains these days, perhaps a sign that I should be looking into them more closely..

  • Seyed Razavi

    Love the code and it works well on linux but it seems to be broken on Windows (Vista) throwing a “SystemStackError: stack level too deep” error in the generate_name method.

    • http://www.skorks.com Alan Skorkin

      It’s never been tried on Vista, I don’t really have a copy :), so I can’t run it to see what happens. However from what you say it is overflowing the stack and the only way that would happen is if the exit condition for the recursive function is not working correctly.

      If you would like to debug the code on Vista I would really appreciate it.

      Otherwise we’ll have to wait until Win7 comes out (since I am probably gonna get a copy of that one) and try it on that (which means we’ll be writing Vista off as a lost cause).

      • Seyed Razavi

        OK, a bit of poking about and I think the problem is with different versions of ruby.

        In your data handler class (or File.open block in the original) I replaced the line:
        chars = file.read.chomp.downcase.gsub(/\s/, ‘ ‘).chars.to_a

        …with:
        chars = file.read.chomp.downcase.gsub(/\s/, ‘ ‘).split(//)

        to get it to work on both machines.

        • http://www.skorks.com Alan Skorkin

          Thanks very much for investigating I really appreciate it. Don’t even get me started on different versions of ruby :).

          I will update the code with your fix as soon as I get the chance.

        • Kiiro Roshi

          I’ve tried this under JRuby 1.3.1 and, besides the provided patch by Seyed, I found that I also need to change the following line in name_generator.rb (line 27 currently):

          count.times do |i|

          … to:

          count.to_i.times do |i|

          The error I got before this change was:
          name_generator_main.rb:10: undefined method `times’ for “10″:String (NoMethodError)

          Anyway, it’s a great script!

          • http://www.skorks.com Alan Skorkin

            Cheers, for letting me know

  • Pingback: How To Write A Name Generator (In Ruby) | dev-Ops

  • Pingback: redy gagaというようなことを、確かだれかが言っていたような気がする。 | 世界へ羽ばたけドラマ情報

  • Pingback: redy gaga・・グーグルクロームは、エクスポートができるよ=