I love reading fantasy, I’ve even written about some of my favourite fantasy series on this blog. One of the things that I have always found interesting about fantasy literature (besides unworkable economies and unsustainable population densities – I tend to over-analyse when I read :)) was how they come up with the names for all the characters. Large fantasy series often contain hundreds of characters – that’s a lot of names. This line of though naturally led me to think of what I would do if I ever needed to make up a bunch of names and being the software developer that I am the answer was naturally – get my computer to make up the names for me.

If you do a search around the web for name generators you get quite a few results, unfortunately most of those don’t tell you how they do what they do and even that is besides the point since I wasn’t really happy with the results that most of these name generators produce. Either the results are way too random (how about 6 consonants in a row) or they are not random enough with clear traces of human intervention (i.e. choosing from a list of pre-made names). Then I found Chris Pounds excellent name generator page. One of the things that he has on this page is his language confluxer (lc) script so for my first attempt at writing a name generator I decided to basically take his script and clean it up a little bit. There were two reasons for this:

  • he uses a pretty clever algorithm for his name generator, it is completely data driven and is therefore able to avoid the 6 consonants/vowels in a row issue while producing output that sounds similar to the data it is based on
  • it was a yucky Perl script and nobody wants to work with that (except Perl programmers), so I felt it was my duty to make it a little bit nicer and since I’ve been playing around with Ruby lately, well you get the picture :)

The Name Generator Algorithm

As I said the script is completely data driven in that it takes a list of words (names in our case) as input and uses these to produce a bunch of randomised names that hopefully sound similar to the original input. It does the following:

  • produces a list of starting letter pairs from the input data (all our names will start with one of these pairs)
  • produces a map of which letters can follow which other letters based on the input data
  • generates words/names by randomly selecting a starting pair and then appending to the word by randomly choosing a letter from the map based on what the last letter in our new word currently is
  • this continues until the word length falls into a particular range (this range is hard-coded in the script)

There are a few more little twists that make this whole thing function but that is the essence of the algorithm.

Faithful Perl-to-Ruby Conversion

So first thing I did was to take the Perl script and do a direct conversion into Ruby, here is what I got:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
require 'getoptlong'
 
data_file = 'data.txt'
words_to_generate = 10
 
min_length = 3
max_length = 9
 
opts = GetoptLong.new(
  ["--datafile", "-d", GetoptLong::OPTIONAL_ARGUMENT],
  ["--number-of-words", "-n", GetoptLong::OPTIONAL_ARGUMENT]
)
 
opts.each do |opt, arg|
  case opt
  when '--datafile'
    data_file = arg
  when '--number-of-words'
    words_to_generate = arg
  end
end
 
start_pairs = []
follower_letters = Hash.new('')
 
File.open(data_file, 'r') do |file|
  chars = file.read.chomp.downcase.gsub(/\s/, ' ').chars.to_a
  chars.push(chars[0], chars[1])
  (chars.length-2).times do |i|
    if chars[i] =~ /\s/
      start_pairs.push(chars[i+1, 2].join)
    end
    follower_letters[chars[i, 2].join]=follower_letters[chars[i,2].join]+chars[i+2,1].join
  end
end
 
def generate_word(word, follower_letters, min_length)
  last_pair = word[-2, 2]
  letter = follower_letters[last_pair].slice(rand(follower_letters[last_pair].length), 1)
  if word =~ /\s$/
    return word unless word.length <= min_length
    return generate_word(word[-1, 1]+letter, follower_letters, min_length)
  else
    word = word.gsub(/^\s/, '')
    return generate_word(word+letter, follower_letters, min_length)
  end
end
 
words_to_generate.times do |i|
  puts generate_word(start_pairs[rand start_pairs.length], follower_letters, min_length)[0, max_length].capitalize
end

At this point I had a bit of a shock at how eerily similar the Ruby version of the script looks compared to the Perl version (*shudders*). Anyways, you can just take the above script put it into a file and run it, you’ll need to give it a data file (here is the one I used).

Cleaning Up The Basic Name Generator

The problems with the above script are:

  • it is not self-documenting
  • it is hard to test
  • it is hard to extend

Anyways, I decided to make it a little bit nicer and easier to play around with by breaking it up into a couple of classes (in the interest of object orientation and stuff):

  • name_generator_main.rb – the script entry point
  • NameGenerator – concerned with name generation (as you might expect)
  • DataHandler – concerned with reading the input data and producing the maps and arrays on which the NameGenerator relies
  • ArgumentParser – concerned with dealing with the command line arguments

You can download all of it here.

Now the main script looks much cleaner and you know exactly what’s happening just by reading it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
require 'argument_parser'
require 'data_handler'
require 'name_generator'
 
argument_parser = ArgumentParser.new
argument_parser.parse_arguments
data_handler = DataHandler.new
data_handler.read_data_file(argument_parser.data_file)
name_generator = NameGenerator.new(data_handler.follower_letters)
names = name_generator.generate_names(argument_parser.words_to_generate, data_handler.start_pairs)
names.each {|name| puts name}

This produces output similar to the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Jamin
Luce
Jevon
Fredy
Hamilinis
Emmano
Shamarcul
Gagaedric
Jary
Raelis

That’s pretty damn good for a random name generator. The best part of it, since it is completely data driven, if you change the input data you completely alter the output. So if you pass in a file with a bunch of French names, you will get French-sounding random names etc. Try it yourself!