Even Boring Form Data Can Be Interesting (For A Developer)

Bored

What could be more boring than capturing credit card data on a form? Well, it's actually not that boring since you may want to encrypt this particular data, which presents it's own set of challenges. Nevertheless, it's still a textbox which takes digits that you store in a database – whoopty doo – not exactly rocket surgery. Well, I've got a piece of data that's got the credit card beat for sheer mundanity – the ABN. If you're an Australian you know all about this. For everybody else, it stands for Australian Business Number which is an 11 digit number, provided by the government to every company. It's not secret (you can look them up online), so you don't even need to encrypt it – difficult to get excited about that. Of course if that was the end of the story, this wouldn't be much of a blog post, so – as you might imagine – things are not as bland as they appear.

The Interesting Thing About A Credit Card Number…

At CrowdHired, we don't tend to deal much with credit card numbers, but ABNs are another matter entirely – since companies are one of the two types of users we have in the system (by the way – as you may have deduced – I've been working for a startup for the last few months, I should really talk about how that happened, it's an interesting story). Just like any piece of data, you want to validate the user input if you possibly can. When I started looking into this for ABNs I discovered that they had an interesting trait, it is a trait which credit card numbers share. You see, both credit cards and ABNs are self-verifying numbers.

I've been doing web development for many years now, but had no idea this was the case. So naturally – being the curious developer that I am – I had to dig a little further. It turns out that these kinds of numbers are quite common, with other well-known examples being ISBNs, UPCs and VINs. Most of these use a variation of a check digit-based algorithm for both validation and generation. Probably the most well-known of these algorithms is the Luhn algorithm which is what credit cards use. So, we'll use a credit card as an example.

Validating And Generating Credit Cards (And Every Other Check-Digit Based Number)

Let us say we have the following credit card number:

4870696871788604

It is 16 digits (Visa and MasterCard are usually 16, but Amex is 15). This number is broken down in the following way:

Issuer Number | Account Number | Check Digit
487069        | 687178860      | 4

You can read lots more about the anatomy of a credit card, but all we want to do is apply the Luhn algorithm to check if this credit card is valid. It goes something like this:

1. Starting from the back, double every second digit

  
4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4
8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4

2. If the doubled numbers form a double digit number, add the two digits

 
4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4
8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4
8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 0 | 4

3. Sum up all the digits of this new number

  8+8+5+0+3+9+3+8+5+1+5+8+7+6+0+4 = 80

4. If the number is perfectly divisible by 10 it is a valid credit card number. Which in our case it is.

You can see how we can use the same algorithm to generate a valid credit card number. All we have to do is set the check digit value to X and then perform all the same steps. During the final step we simply pick our check digit in such a way as to make sure the sum of all the digits is divisible by 10. Let's do this for a slightly altered version of our previous credit card number (we simply set the digit before the check digit to 1 making the credit card number invalid).

  
4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 1 | X
8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 | 2 | X
8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 2 | X
 
8+8+5+0+3+9+3+8+5+1+5+8+7+6+2+X = 78+X
X = (78%10 == 0) ? 0 : 10 - 78%10
X=2

As you can see no matter what the other 15 digits are, we'll always be able to pick a check digit between 0 and 9 that will make the credit card number valid.

Of course not every self-verifying number uses the Luhn algorithm, most don't use mod(10) to work out what the check digit should be, and for some numbers like the IBAN, the check digit actually consists of 2 digits. And yet, the most curious self-verifying number of the lot is the first one I learned about – the ABN. This is because, for the life of me, I couldn't work out what the check digit of the ABN could be.

The Curious Case Of The ABN

Curious

Australia is certainly is not averse to using check digit-based algorithms. The Australian Tax File Number (TFN) and the Australian Company Number (ACN) are just two examples, but the ABN seems to be different. At first glance the ABN validation algorithm is just more of the same, it just has a larger than normal "mod" step at the end (mod(89)).

  • Subtract 1 from the first (left) digit to give a new eleven digit number
  • Multiply each of the digits in this new number by its weighting factor
  • Sum the resulting 11 products
  • Divide the total by 89, noting the remainder
  • If the remainder is zero the number is valid

In-fact, here is some ruby code to validate an ABN which I appropriated from the Ruby ABN gem (and then rolled it into a nice Rails 3, ActiveRecord validator so we could do validates_abn_format_of in all out models :)) :

def is_integer?(number)
  Integer(number) 
  true 
rescue 
  false
end
  
def abn_valid?(number)
  raw_number = number
  number = number.to_s.tr ' ', ''
  return false unless is_integer?(number) && number.length == 11

  weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
  sum = 0
  (0..10).each do |i|
    c = number[i,1]
    digit = c.to_i - (i.zero? ? 1 : 0)
    sum += weights[i] * digit
  end

  sum % 89 == 0 ? true : false
end

But, while validating ABNs is easy, generating them is a whole other matter. As we've seen, with a check digit-based algorithm, generating the number is the same as validating the number, except we pick the digit in such a way as to make our 'mod' step evaluate to zero. But with a number such as the ABN, where there is no apparent check digit (perhaps I am just having a bout of stupid, so if you can see an obvious check digit with ABNs do let me know), how do you easily generate a valid number? In-fact, why would you want to generate these numbers in the first place, isn't being able to validate them enough?

Well, in the case of CrowdHired, we tend to create object trees that are quite deep, so we build an maintain some infrastructure code to allow us to create fake data for use during development (another interesting thing to talk about at a later date). Before we started using the self-validating properties of ABNs we simply generated any old 11 digit number as fake data for ABN fields, but once the validations started kicking in this was no longer an option. Being the pragmatic developers that we are (even if we do say so ourselves), we took some real ABNs (like our own) chucked them into an array and randomly picked from there. But, this offended the developer gods, or my developer pride – whichever, so one Saturday I decided to take a couple of hours to generate some truly random ABNs that were still valid. Here is the code I came up with (it is now a proud part of our fake data generation script):

def random_abn
  weights = [10,1,3,5,7,9,11,13,15,17,19]
  reversed_weights = weights.reverse
  initial_numbers = []
  final_numbers = []
  9.times {initial_numbers << rand(9)+1}
  initial_numbers = [rand(8)+1, rand(7)+2] + initial_numbers
  products = []
  weights.each_with_index do |weight, index|
    products << weight * initial_numbers[index]
  end
  product_sum = products.inject(0){|sum, value| sum + value}
  remainder = product_sum % 89
  if remainder == 0
    final_numbers = initial_numbers
  else
    current_remainder = remainder
    reversed_numbers = initial_numbers.reverse
    reversed_weights.each_with_index do |weight, index|
      next if weight > current_remainder
      if reversed_numbers[index] > 0
        reversed_numbers[index] -= 1
        current_remainder -= weight
        if current_remainder < reversed_weights[index+1]
          redo
        end
      end
    end
    final_numbers = reversed_numbers.reverse
  end
  final_numbers[0] += 1
  final_numbers.join
end

The idea is pretty simple. Let's go through an example to demonstrate:

1. Firstly we randomly generate 11 digits between 0 and 9 to make up our probably ABN (they are actually not all between 0 and 9 but more on that shortly)

7 5 8 9 8 7 3 4 1 5 3

2. We then perform the validation steps on that number

  • multiply the digits by their weights to get weight-digit products 

      
          7x10=70
          5x1=5
          8x3=24
          9x5=45
          8x7=56
          7x9=63
          3x11=33
          4x13=52
          1x15=15
          5x17=85
          3x19=57
        
  • sum the products 70+5+24+45+56+63+33+52+15+85+57 = 505
  • divide by 89 to get the remainder 505 mod 89 = 60

3. Since we do mod(89) at worst we'll be off by 88 (although if we get 0 as the remainder we lucked out with a valid ABN straight away), we now use the weight-digit products to "give change", subtracting from the remainder as we go until we hit zero.

We start with the last digit where the weight is 19. We subtract 1 from this digit, which means we can subtract 19 from our remainder. We then move on to the next digit until the remainder hits zero

  
Initial | Change  | Remainder
-------------------------------
7x10=70 | 7x10=70 | 0
5x1=5   | 5x1=5   | 0
8x3=24  | 8x3=24  | 0
9x5=45  | 9x5=45  | 0
8x7=56  | 8x7=56  | 0
7x9=63  | <strong>6</strong>x9=63  | 0
3x11=33 | 3x11=33 | 9
4x13=52 | 4x13=52 | 9
1x15=15 | 0x15=0  | 9
5x17=85 | <strong>4</strong>x17=68 | 24
3x19=57 | <strong>2</strong>x19=38 | 41

4. This gives us our new number

7 5 8 9 8 6 3 4 0 4 2

5. Now we just need to add 1 to the very first number (as per the ABN validation steps) and we have our valid ABN

85898634042

There are a couple of nuances to those steps.

  • None of our initial generated digits can be zero. Since we "give change" by subtracting 1 from each digit we need to ensure we can actually subtract 1 (otherwise things become much more complex). So we ensure our digits are randomly generated between 1 and 9 rather tahn between 0 and 9.
  • Even when all our initial digits are at least 1 we may still fail to "give change" for some of the remainders, the easiest example of this is when our remainder is 2. The only digit when can use to give change is the one with the weight of 1 (i.e. the second digit of the ABN). If this digit is initially generated to be 1, we can only give change once at which point we end up with a remainder of 1 and there is nothing more we can do about it. In-fact this exact scenario can happen for a few other remainders namely 86, 77, 66, 53, 38, 21. The easies way to overcome this problem is to ensure that the digit that has a weight of 1, is always randomly generated to have a value of at least 2. This way we can use it to give change twice and our problematic remainders are covered.
  • Lastly, since we have to add 1 to the very first digit as our final step, we need to make sure that this digit is not already equal to 9, so we generate that one to be between 1 and 8.

Given these nuances, this algorithm won't generate every possible ABN, but it will give you a large percentage of possible ABNs which is good enough for our needs. It took about an hour to get that working (we won't mention the little bug where I forgot the remainder could be zero from the start, which caused much grief to our random data generator :)), but it was a fun little exercise – time well spent as far as I am concerned. And to think, all this learning about self-validating numbers and algorithmic coding fun was triggered by trying to capture the most mundane piece of data on a form. It just goes to show that you can learn and grow no matter where you are and what you're doing, you just need to see the opportunities for what they are.

Images by Sn.Ho and Sangudo

  • Samuel

    But you do not bother to prevent duplicate entries. Therefore you could have just as easily gone with something like 12345678901, and increased by 1 until you get a valid number. Seems wasteful to write something complex when you could just do trial and error with a guarantee that it will eventually work. That way, you can actually guarantee that they are all unique.

    Slow? Yeah, theoretically. Integer math is fast enough that you’ll likely never see it take longer than microseconds. And it’s a test data generator. Speed is irrelevant anyway.

    • http://www.skorks.com Alan Skorkin

      Firstly, duplicate entries are not an issue, they are allowed in the system, if they weren’t we could leave them to be picked up by validations at which point we would act, but as I said not an issue.

      Secondly, you think speed doesn’t matter? Even a small amount of data takes several minutes to populate for our system. A slightly larger one can take significantly longer, so any amount of time you save on a frequently generated piece of data is well worth it. In-fact I have found faster ways to generate other pieces of data, such as names etc. and it made a noticeable impact.

      Thirdly even if you increased by one, you still couldn’t guarantee uniqueness unless you store state somewhere.

      Lastly, the idea wasn’t just to generate ABNs as quickly as possible, there was also the desirable side-effect of learning something, and solving a bite-sized problem (and getting some satisfaction from that) and that objective was achieved admirably.

  • Gavin S

    With the weights of 10 and 1, doesn’t that make the first two digits the check? Aside of course from adding 1 to the first digit.
    4 9 4 2 6 3 2 4 8 5 6
    2 7 8 2 8 1 5 8 6 3 8
    8 5 9 8 7 1 0 0 4 7 2
    7 0 7 5 2 1 4 4 4 9 2

    I generated these in excel real quick. My method was to only generate a 9 digit number and use the 2 digit remainder of the summed weights for the first two digits of the ABN and then add the 1. I think this worked and it allows for zeros in the number as well.

    • http://www.skorks.com Alan Skorkin

      Hi Gavin,

      You’re of course quite right, I can’t believe I didn’t see it, it is obvious now that you point it out. We subtract 1 from the very first number just so the maximum number the first two digits could make would be 89. It seems so obvious, but I totally missed it, subtracting the 1 was enough to confuse me totally :).

      I guess my complex algorithm is no longer necessary, sure was fun to come up with it though.

      • http://twitter.com/Oblongmana James Hill

        Hi Alan,

        Quality post – while your solution wasn’t the one I wound up using, it did set me on the right path!

        Quick clarification on this (and I know I’m late to the party, but in case anyone else comes across this!): the numbers given by Gavin are not quite valid, but Gavin’s method is super-close, just missing one step. So the whole process is:
        - Generate a 9 digit number (being the last 9 digits of the ABN)
        - Multiply them by the (0-indexed) 2nd through 10th elements of the weights array
        - Sum the products you just calculated
        - Get the remainder against 89 of the Sum you just calculated [MOD(sumValue,89) in Excel, x % 89 in most programming languages]
        - Subtract the remainder you just calculated from 89, and add 10 (that is, add 1 to the tens column of the check digit!)
        - Join (as a string) the amount you just calculated with your original 9 digits, and you have a valid ABN.

        Bear in mind if you’re working on this problem, that you cannot make a valid ACN by chopping the two leading digits from an ABN, nor can you make a valid ABN by adding 2 leading digits to an ACN.

        Check out the Gist here https://gist.github.com/Oblongmana/946b14a6cedd93d90138 for some code for generating random ACNs and ABNs in Apex (a very Java-like language used in Salesforce.com development) – should be reasonably easy to adapt to your preferred language.

  • http://andypalmer.com Andy Palmer

    It sounds like you might enjoy the killmath series, particularly the demonstration of the scrubbing calculator, because it demonstrates a similar (albeit simpler) equation solving problem.

    • http://www.skorks.com Alan Skorkin

      Hey Andy,

      Thanks very much for the link, looks quite interesting I’ll definitely give it a read.

  • Stephen P.

    Alan!
    We’ve (ok, I’VE) missed you buddy!

    Gonna have to read the article later tho.

    Thanks,

    • http://www.skorks.com Alan Skorkin

      Hey Stephen,

      Yeah I’ve been really slack for a few months, but if anyone asks I’ll blame working for a startup :). Hope you enjoy the article.

  • Steven Ringo

    Skorks,

    I think the check digit is actually two digits, because the modulus itself is two digits.

    In many cases also the ABN is generated from it corresponding ACN (if the entity is a company), like so:

    ABN = 2 digits+ ACN

    I have a (strong) feeling those 2 digits are the check digits. I haven’t had a chance to run it through yet…

    Steve

    • http://www.skorks.com Alan Skorkin

      Hi Steven,

      You’re totally right, Gavin pointed our the same thing in one of the comments above. Now that it is staring me in the face I am flabbergasted that I didn’t see it – not my finest moment :).

      I also interesting that you point out that ABN = 2digits + ACN cause ACN has a check digit all of its own, but prepending two numbers produces a different number for which those two numbers are a check digit – kinda cool.

  • UJ

    Hi Alan,

    I really find your blog interesting, primarily because of its simple yet effective style of writing. I just came across it last week and I’ve already been thru several of the articles. Keep up the great work! :)

  • Pingback: Is it divisible by 10? If not, it isn’t a valid credit card | Bloggo Schloggo

  • http://kevinelliott.net/ Kevin Elliott

    This is a really great article. It’s a nice combination between educational, fun, and generally interesting. It’s the kind that belongs in an ezine!

  • Pingback: 程序员:枯燥的表单数据也可以变得有趣 - 博客 - 伯乐在线