What could be more boring than capturing credit card data on a form? Well, it's actually not that boring since you may want to encrypt this particular data, which presents it's own set of challenges. Nevertheless, it's still a textbox which takes digits that you store in a database – whoopty doo – not exactly rocket surgery. Well, I've got a piece of data that's got the credit card beat for sheer mundanity – the ABN. If you're an Australian you know all about this. For everybody else, it stands for Australian Business Number which is an 11 digit number, provided by the government to every company. It's not secret (you can look them up online), so you don't even need to encrypt it – difficult to get excited about that. Of course if that was the end of the story, this wouldn't be much of a blog post, so – as you might imagine – things are not as bland as they appear.
At CrowdHired, we don't tend to deal much with credit card numbers, but ABNs are another matter entirely – since companies are one of the two types of users we have in the system (by the way – as you may have deduced – I've been working for a startup for the last few months, I should really talk about how that happened, it's an interesting story). Just like any piece of data, you want to validate the user input if you possibly can. When I started looking into this for ABNs I discovered that they had an interesting trait, it is a trait which credit card numbers share. You see, both credit cards and ABNs are self-verifying numbers.
I've been doing web development for many years now, but had no idea this was the case. So naturally – being the curious developer that I am – I had to dig a little further. It turns out that these kinds of numbers are quite common, with other well-known examples being ISBNs, UPCs and VINs. Most of these use a variation of a check digit-based algorithm for both validation and generation. Probably the most well-known of these algorithms is the Luhn algorithm which is what credit cards use. So, we'll use a credit card as an example.
Let us say we have the following credit card number:
4870696871788604
It is 16 digits (Visa and MasterCard are usually 16, but Amex is 15). This number is broken down in the following way:
Issuer Number | Account Number | Check Digit 487069 | 687178860 | 4
You can read lots more about the anatomy of a credit card, but all we want to do is apply the Luhn algorithm to check if this credit card is valid. It goes something like this:
1. Starting from the back, double every second digit
4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4 8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4
2. If the doubled numbers form a double digit number, add the two digits
4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4 8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4 8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 0 | 4
3. Sum up all the digits of this new number
8+8+5+0+3+9+3+8+5+1+5+8+7+6+0+4 = 80
4. If the number is perfectly divisible by 10 it is a valid credit card number. Which in our case it is.
You can see how we can use the same algorithm to generate a valid credit card number. All we have to do is set the check digit value to X and then perform all the same steps. During the final step we simply pick our check digit in such a way as to make sure the sum of all the digits is divisible by 10. Let's do this for a slightly altered version of our previous credit card number (we simply set the digit before the check digit to 1 making the credit card number invalid).
4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 1 | X 8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 | 2 | X 8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 2 | X
8+8+5+0+3+9+3+8+5+1+5+8+7+6+2+X = 78+X X = (78%10 == 0) ? 0 : 10 - 78%10 X=2
As you can see no matter what the other 15 digits are, we'll always be able to pick a check digit between 0 and 9 that will make the credit card number valid.
Of course not every self-verifying number uses the Luhn algorithm, most don't use mod(10)
to work out what the check digit should be, and for some numbers like the IBAN, the check digit actually consists of 2 digits. And yet, the most curious self-verifying number of the lot is the first one I learned about – the ABN. This is because, for the life of me, I couldn't work out what the check digit of the ABN could be.
Australia is certainly is not averse to using check digit-based algorithms. The Australian Tax File Number (TFN) and the Australian Company Number (ACN) are just two examples, but the ABN seems to be different. At first glance the ABN validation algorithm is just more of the same, it just has a larger than normal "mod" step at the end (mod(89)
).
In-fact, here is some ruby code to validate an ABN which I appropriated from the Ruby ABN gem (and then rolled it into a nice Rails 3, ActiveRecord validator so we could do validates_abn_format_of
in all out models :)) :
def is_integer?(number) Integer(number) true rescue false end def abn_valid?(number) raw_number = number number = number.to_s.tr ' ', '' return false unless is_integer?(number) && number.length == 11 weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19] sum = 0 (0..10).each do |i| c = number[i,1] digit = c.to_i - (i.zero? ? 1 : 0) sum += weights[i] * digit end sum % 89 == 0 ? true : false end
But, while validating ABNs is easy, generating them is a whole other matter. As we've seen, with a check digit-based algorithm, generating the number is the same as validating the number, except we pick the digit in such a way as to make our 'mod
' step evaluate to zero. But with a number such as the ABN, where there is no apparent check digit (perhaps I am just having a bout of stupid, so if you can see an obvious check digit with ABNs do let me know), how do you easily generate a valid number? In-fact, why would you want to generate these numbers in the first place, isn't being able to validate them enough?
Well, in the case of CrowdHired, we tend to create object trees that are quite deep, so we build an maintain some infrastructure code to allow us to create fake data for use during development (another interesting thing to talk about at a later date). Before we started using the self-validating properties of ABNs we simply generated any old 11 digit number as fake data for ABN fields, but once the validations started kicking in this was no longer an option. Being the pragmatic developers that we are (even if we do say so ourselves), we took some real ABNs (like our own) chucked them into an array and randomly picked from there. But, this offended the developer gods, or my developer pride – whichever, so one Saturday I decided to take a couple of hours to generate some truly random ABNs that were still valid. Here is the code I came up with (it is now a proud part of our fake data generation script):
def random_abn weights = [10,1,3,5,7,9,11,13,15,17,19] reversed_weights = weights.reverse initial_numbers = [] final_numbers = [] 9.times {initial_numbers << rand(9)+1} initial_numbers = [rand(8)+1, rand(7)+2] + initial_numbers products = [] weights.each_with_index do |weight, index| products << weight * initial_numbers[index] end product_sum = products.inject(0){|sum, value| sum + value} remainder = product_sum % 89 if remainder == 0 final_numbers = initial_numbers else current_remainder = remainder reversed_numbers = initial_numbers.reverse reversed_weights.each_with_index do |weight, index| next if weight > current_remainder if reversed_numbers[index] > 0 reversed_numbers[index] -= 1 current_remainder -= weight if current_remainder < reversed_weights[index+1] redo end end end final_numbers = reversed_numbers.reverse end final_numbers[0] += 1 final_numbers.join end
The idea is pretty simple. Let's go through an example to demonstrate:
1. Firstly we randomly generate 11 digits between 0 and 9 to make up our probably ABN (they are actually not all between 0 and 9 but more on that shortly)
7 5 8 9 8 7 3 4 1 5 3
2. We then perform the validation steps on that number
multiply the digits by their weights to get weight-digit products
7x10=70 5x1=5 8x3=24 9x5=45 8x7=56 7x9=63 3x11=33 4x13=52 1x15=15 5x17=85 3x19=57
70+5+24+45+56+63+33+52+15+85+57 = 505
505 mod 89 = 60
3. Since we do mod(89)
at worst we'll be off by 88 (although if we get 0 as the remainder we lucked out with a valid ABN straight away), we now use the weight-digit products to "give change", subtracting from the remainder as we go until we hit zero.
We start with the last digit where the weight is 19. We subtract 1 from this digit, which means we can subtract 19 from our remainder. We then move on to the next digit until the remainder hits zero
Initial | Change | Remainder ------------------------------- 7x10=70 | 7x10=70 | 0 5x1=5 | 5x1=5 | 0 8x3=24 | 8x3=24 | 0 9x5=45 | 9x5=45 | 0 8x7=56 | 8x7=56 | 0 7x9=63 | <strong>6</strong>x9=63 | 0 3x11=33 | 3x11=33 | 9 4x13=52 | 4x13=52 | 9 1x15=15 | 0x15=0 | 9 5x17=85 | <strong>4</strong>x17=68 | 24 3x19=57 | <strong>2</strong>x19=38 | 41
4. This gives us our new number
7 5 8 9 8 6 3 4 0 4 2
5. Now we just need to add 1 to the very first number (as per the ABN validation steps) and we have our valid ABN
85898634042
There are a couple of nuances to those steps.
Given these nuances, this algorithm won't generate every possible ABN, but it will give you a large percentage of possible ABNs which is good enough for our needs. It took about an hour to get that working (we won't mention the little bug where I forgot the remainder could be zero from the start, which caused much grief to our random data generator :)), but it was a fun little exercise – time well spent as far as I am concerned. And to think, all this learning about self-validating numbers and algorithmic coding fun was triggered by trying to capture the most mundane piece of data on a form. It just goes to show that you can learn and grow no matter where you are and what you're doing, you just need to see the opportunities for what they are.
]]>