Even Boring Form Data Can Be Interesting (For A Developer)

by Alan Skorkin on August 17, 2011

BoredWhat could be more boring than capturing credit card data on a form? Well, it's actually not that boring since you may want to encrypt this particular data, which presents it's own set of challenges. Nevertheless, it's still a textbox which takes digits that you store in a database – whoopty doo – not exactly rocket surgery. Well, I've got a piece of data that's got the credit card beat for sheer mundanity – the ABN. If you're an Australian you know all about this. For everybody else, it stands for Australian Business Number which is an 11 digit number, provided by the government to every company. It's not secret (you can look them up online), so you don't even need to encrypt it – difficult to get excited about that. Of course if that was the end of the story, this wouldn't be much of a blog post, so – as you might imagine – things are not as bland as they appear.

The Interesting Thing About A Credit Card Number…

At CrowdHired, we don't tend to deal much with credit card numbers, but ABNs are another matter entirely – since companies are one of the two types of users we have in the system (by the way – as you may have deduced – I've been working for a startup for the last few months, I should really talk about how that happened, it's an interesting story). Just like any piece of data, you want to validate the user input if you possibly can. When I started looking into this for ABNs I discovered that they had an interesting trait, it is a trait which credit card numbers share. You see, both credit cards and ABNs are self-verifying numbers.

I've been doing web development for many years now, but had no idea this was the case. So naturally – being the curious developer that I am – I had to dig a little further. It turns out that these kinds of numbers are quite common, with other well-known examples being ISBNs, UPCs and VINs. Most of these use a variation of a check digit-based algorithm for both validation and generation. Probably the most well-known of these algorithms is the Luhn algorithm which is what credit cards use. So, we'll use a credit card as an example.

Validating And Generating Credit Cards (And Every Other Check-Digit Based Number)

Let us say we have the following credit card number:

4870696871788604

It is 16 digits (Visa and MasterCard are usually 16, but Amex is 15). This number is broken down in the following way:

Issuer Number | Account Number | Check Digit
487069        | 687178860      | 4

You can read lots more about the anatomy of a credit card, but all we want to do is apply the Luhn algorithm to check if this credit card is valid. It goes something like this:

1. Starting from the back, double every second digit

  4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4
  8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4

2. If the doubled numbers form a double digit number, add the two digits

  4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4
  8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4
  8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 0 | 4

3. Sum up all the digits of this new number

  8+8+5+0+3+9+3+8+5+1+5+8+7+6+0+4 = 80

4. If the number is perfectly divisible by 10 it is a valid credit card number. Which in our case it is.

 

You can see how we can use the same algorithm to generate a valid credit card number. All we have to do is set the check digit value to X and then perform all the same steps. During the final step we simply pick our check digit in such a way as to make sure the sum of all the digits is divisible by 10. Let's do this for a slightly altered version of our previous credit card number (we simply set the digit before the check digit to 1 making the credit card number invalid).

  4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 1 | X
  8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 | 2 | X
  8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 2 | X
  8+8+5+0+3+9+3+8+5+1+5+8+7+6+2+X = 78+X
  X = (78%10 == 0) ? 0 : 10 - 78%10
  X=2

As you can see no matter what the other 15 digits are, we'll always be able to pick a check digit between 0 and 9 that will make the credit card number valid.

Of course not every self-verifying number uses the Luhn algorithm, most don't use mod(10) to work out what the check digit should be, and for some numbers like the IBAN, the check digit actually consists of 2 digits. And yet, the most curious self-verifying number of the lot is the first one I learned about – the ABN. This is because, for the life of me, I couldn't work out what the check digit of the ABN could be.

The Curious Case Of The ABN

Curious

Australia is certainly is not averse to using check digit-based algorithms. The Australian Tax File Number (TFN) and the Australian Company Number (ACN) are just two examples, but the ABN seems to be different. At first glance the ABN validation algorithm is just more of the same, it just has a larger than normal "mod" step at the end (mod(89)).

  • Subtract 1 from the first (left) digit to give a new eleven digit number
  • Multiply each of the digits in this new number by its weighting factor
  • Sum the resulting 11 products
  • Divide the total by 89, noting the remainder
  • If the remainder is zero the number is valid

In-fact, here is some ruby code to validate an ABN which I appropriated from the Ruby ABN gem (and then rolled it into a nice Rails 3, ActiveRecord validator so we could do validates_abn_format_of in all out models :)) :

  def is_integer?(number)
    Integer(number)
    true
  rescue
    false
  end
 
  def abn_valid?(number)
    raw_number = number
    number = number.to_s.tr ' ',''
    return false unless is_integer?(number) && number.length == 11
 
    weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
    sum = 0
    (0..10).each do |i|
      c = number[i,1]
      digit = c.to_i - (i.zero? ? 1 : 0)
      sum += weights[i] * digit
    end
 
    sum % 89 == 0 ? true : false
  end

But, while validating ABNs is easy, generating them is a whole other matter. As we've seen, with a check digit-based algorithm, generating the number is the same as validating the number, except we pick the digit in such a way as to make our 'mod' step evaluate to zero. But with a number such as the ABN, where there is no apparent check digit (perhaps I am just having a bout of stupid, so if you can see an obvious check digit with ABNs do let me know), how do you easily generate a valid number? In-fact, why would you want to generate these numbers in the first place, isn't being able to validate them enough?

Well, in the case of CrowdHired, we tend to create object trees that are quite deep, so we build an maintain some infrastructure code to allow us to create fake data for use during development (another interesting thing to talk about at a later date). Before we started using the self-validating properties of ABNs we simply generated any old 11 digit number as fake data for ABN fields, but once the validations started kicking in this was no longer an option. Being the pragmatic developers that we are (even if we do say so ourselves), we took some real ABNs (like our own) chucked them into an array and randomly picked from there. But, this offended the developer gods, or my developer pride – whichever, so one Saturday I decided to take a couple of hours to generate some truly random ABNs that were still valid. Here is the code I came up with (it is now a proud part of our fake data generation script):

  def random_abn
    weights = [10,1,3,5,7,9,11,13,15,17,19]
    reversed_weights = weights.reverse
    initial_numbers = []
    final_numbers = []
    9.times {initial_numbers << rand(9)+1}
    initial_numbers = [rand(8)+1, rand(7)+2] + initial_numbers
    products = []
    weights.each_with_index do |weight, index|
      products << weight * initial_numbers[index]
    end
    product_sum = products.inject(0){|sum, value| sum + value}
    remainder = product_sum % 89
    if remainder == 0
      final_numbers = initial_numbers
    else
      current_remainder = remainder
      reversed_numbers = initial_numbers.reverse
      reversed_weights.each_with_index do |weight, index|
        next if weight > current_remainder
        if reversed_numbers[index] > 0
          reversed_numbers[index] -= 1
          current_remainder -= weight
          if current_remainder < reversed_weights[index+1]
            redo
          end
        end
      end
      final_numbers = reversed_numbers.reverse
    end
    final_numbers[0] += 1
    final_numbers.join
  end

The idea is pretty simple. Let's go through an example to demonstrate:

1. Firstly we randomly generate 11 digits between 0 and 9 to make up our probably ABN (they are actually not all between 0 and 9 but more on that shortly)

  7 5 8 9 8 7 3 4 1 5 3

2. We then perform the validation steps on that number

  • multiply the digits by their weights to get weight-digit products 

      7x10=70
      5x1=5
      8x3=24
      9x5=45
      8x7=56
      7x9=63
      3x11=33
      4x13=52
      1x15=15
      5x17=85
      3x19=57
    
  • sum the products
    70+5+24+45+56+63+33+52+15+85+57 = 505
  • divide by 89 to get the remainder
    505 mod 89 = 60

3. Since we do mod(89) at worst we'll be off by 88 (although if we get 0 as the remainder we lucked out with a valid ABN straight away), we now use the weight-digit products to "give change", subtracting from the remainder as we go until we hit zero.

We start with the last digit where the weight is 19. We subtract 1 from this digit, which means we can subtract 19 from our remainder. We then move on to the next digit until the remainder hits zero

  Initial | Change  | Remainder
  -------------------------------
  7x10=70 | 7x10=70 | 0
  5x1=5   | 5x1=5   | 0
  8x3=24  | 8x3=24  | 0
  9x5=45  | 9x5=45  | 0
  8x7=56  | 8x7=56  | 0
  7x9=63  | 6x9=63  | 0
  3x11=33 | 3x11=33 | 9
  4x13=52 | 4x13=52 | 9
  1x15=15 | 0x15=0  | 9
  5x17=85 | 4x17=68 | 24
  3x19=57 | 2x19=38 | 41

4. This gives us our new number

7 5 8 9 8 6 3 4 0 4 2

5. Now we just need to add 1 to the very first number (as per the ABN validation steps) and we have our valid ABN

85898634042

There are a couple of nuances to those steps.

  • None of our initial generated digits can be zero. Since we "give change" by subtracting 1 from each digit we need to ensure we can actually subtract 1 (otherwise things become much more complex). So we ensure our digits are randomly generated between 1 and 9 rather tahn between 0 and 9.
  • Even when all our initial digits are at least 1 we may still fail to "give change" for some of the remainders, the easiest example of this is when our remainder is 2. The only digit when can use to give change is the one with the weight of 1 (i.e. the second digit of the ABN). If this digit is initially generated to be 1, we can only give change once at which point we end up with a remainder of 1 and there is nothing more we can do about it. In-fact this exact scenario can happen for a few other remainders namely 86, 77, 66, 53, 38, 21. The easies way to overcome this problem is to ensure that the digit that has a weight of 1, is always randomly generated to have a value of at least 2. This way we can use it to give change twice and our problematic remainders are covered.
  • Lastly, since we have to add 1 to the very first digit as our final step, we need to make sure that this digit is not already equal to 9, so we generate that one to be between 1 and 8.

Given these nuances, this algorithm won't generate every possible ABN, but it will give you a large percentage of possible ABNs which is good enough for our needs. It took about an hour to get that working (we won't mention the little bug where I forgot the remainder could be zero from the start, which caused much grief to our random data generator :)), but it was a fun little exercise – time well spent as far as I am concerned. And to think, all this learning about self-validating numbers and algorithmic coding fun was triggered by trying to capture the most mundane piece of data on a form. It just goes to show that you can learn and grow no matter where you are and what you're doing, you just need to see the opportunities for what they are.

Images by Sn.Ho and Sangudo

{ 1 trackback }

Is it divisible by 10? If not, it isn’t a valid credit card | Bloggo Schloggo
October 30, 2011 at 11:16 am

{ 12 comments… read them below or add one }

Samuel August 18, 2011 at 12:01 am

But you do not bother to prevent duplicate entries. Therefore you could have just as easily gone with something like 12345678901, and increased by 1 until you get a valid number. Seems wasteful to write something complex when you could just do trial and error with a guarantee that it will eventually work. That way, you can actually guarantee that they are all unique.

Slow? Yeah, theoretically. Integer math is fast enough that you’ll likely never see it take longer than microseconds. And it’s a test data generator. Speed is irrelevant anyway.

Reply

Alan Skorkin August 18, 2011 at 2:38 pm

Firstly, duplicate entries are not an issue, they are allowed in the system, if they weren’t we could leave them to be picked up by validations at which point we would act, but as I said not an issue.

Secondly, you think speed doesn’t matter? Even a small amount of data takes several minutes to populate for our system. A slightly larger one can take significantly longer, so any amount of time you save on a frequently generated piece of data is well worth it. In-fact I have found faster ways to generate other pieces of data, such as names etc. and it made a noticeable impact.

Thirdly even if you increased by one, you still couldn’t guarantee uniqueness unless you store state somewhere.

Lastly, the idea wasn’t just to generate ABNs as quickly as possible, there was also the desirable side-effect of learning something, and solving a bite-sized problem (and getting some satisfaction from that) and that objective was achieved admirably.

Reply

Gavin S August 18, 2011 at 12:32 am

With the weights of 10 and 1, doesn’t that make the first two digits the check? Aside of course from adding 1 to the first digit.
4 9 4 2 6 3 2 4 8 5 6
2 7 8 2 8 1 5 8 6 3 8
8 5 9 8 7 1 0 0 4 7 2
7 0 7 5 2 1 4 4 4 9 2

I generated these in excel real quick. My method was to only generate a 9 digit number and use the 2 digit remainder of the summed weights for the first two digits of the ABN and then add the 1. I think this worked and it allows for zeros in the number as well.

Reply

Alan Skorkin August 18, 2011 at 2:40 pm

Hi Gavin,

You’re of course quite right, I can’t believe I didn’t see it, it is obvious now that you point it out. We subtract 1 from the very first number just so the maximum number the first two digits could make would be 89. It seems so obvious, but I totally missed it, subtracting the 1 was enough to confuse me totally :).

I guess my complex algorithm is no longer necessary, sure was fun to come up with it though.

Reply

Andy Palmer August 18, 2011 at 1:46 am

It sounds like you might enjoy the killmath series, particularly the demonstration of the scrubbing calculator, because it demonstrates a similar (albeit simpler) equation solving problem.

Reply

Alan Skorkin August 18, 2011 at 2:41 pm

Hey Andy,

Thanks very much for the link, looks quite interesting I’ll definitely give it a read.

Reply

Stephen P. August 18, 2011 at 9:17 am

Alan!
We’ve (ok, I’VE) missed you buddy!

Gonna have to read the article later tho.

Thanks,

Reply

Alan Skorkin August 18, 2011 at 2:42 pm

Hey Stephen,

Yeah I’ve been really slack for a few months, but if anyone asks I’ll blame working for a startup :). Hope you enjoy the article.

Reply

Steven Ringo August 18, 2011 at 12:15 pm

Skorks,

I think the check digit is actually two digits, because the modulus itself is two digits.

In many cases also the ABN is generated from it corresponding ACN (if the entity is a company), like so:

ABN = 2 digits+ ACN

I have a (strong) feeling those 2 digits are the check digits. I haven’t had a chance to run it through yet…

Steve

Reply

Alan Skorkin August 18, 2011 at 2:45 pm

Hi Steven,

You’re totally right, Gavin pointed our the same thing in one of the comments above. Now that it is staring me in the face I am flabbergasted that I didn’t see it – not my finest moment :).

I also interesting that you point out that ABN = 2digits + ACN cause ACN has a check digit all of its own, but prepending two numbers produces a different number for which those two numbers are a check digit – kinda cool.

Reply

UJ August 18, 2011 at 5:11 pm

Hi Alan,

I really find your blog interesting, primarily because of its simple yet effective style of writing. I just came across it last week and I’ve already been thru several of the articles. Keep up the great work! :)

Reply

Kevin Elliott November 18, 2011 at 7:00 pm

This is a really great article. It’s a nice combination between educational, fun, and generally interesting. It’s the kind that belongs in an ezine!

Reply

Leave a Comment

Previous post:

Next post: