I'm working on a project that will take user-generated content and convert it to a URL shortcode for sharing. While the rise of this type of service is rampant, with a ton of different URL shorteners offering similar services, my expectation is that most of the use will be humans reciting the URL, not emailing it or scanning a QR Code (the primary function).
This gave me the idea of removing questionable or easily confused characters and numbers from a URL to make it less prone to human errors when being transcribed or spoken. I stumbled across this Apple report about scrambled iTunes codes, where they list the typical letter-number swaps that occur when users are entering redeem codes. The list is repeated below:
- The letter A and the letter H
- The letter B and the number 8
- The letter D and the number 0
- The letter E and the number 3
- The letter G and the number 6
- The letter H and the letter W
- The letter J and the number 1
- The letter M and the letter N
- The letter O and the number 0
- The letter P and the letter F
- The letter Q and the letter O
- The letter Q and the number 0
- The letter S and the number 5
- The letter S and the number 8
- The letter V and the letter U
- The letter Z and the number 2
So the shortlist above gives me a list of Letters & Numbers I should be excluding for typical use:
A, B, D, E, F, G, H, I, J, M, N, O, P, Q, S, U, V, W, Z, 0, 1, 2, 3, 5, 6, 8
Which leaves the following available:
C, K, L, R, T, X, Y, 4, 7, 9
Thankfully, there's 10 available, which will make a simple numeral-replacement list for accessing Database-keyed numerical ID's. I've re-arranged the order so that the letters are similarly shaped as their numerical counterparts for easier translating, and the numbers used actually represent the actual value. A small bit of data-obscuration, and we have a new list!
C, L, R, K, 4, T, Y, 7, X, 9
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
So with a few small Database fixes, we can limit the length of our URLs, and still have millions of unique ones to give out. In the following examples, I can have 10-trillion different unique URLs with a 10-digit number: 10 x 10 x 10 x 10 x 10 x 10 x 10 x 10 x 10 x 10 = 10,000,000,000
If I went so far as to never repeat a number in the 10-digit sequence (so that it makes it even easier to never capture/use the wrong character) I'm limited to 3,628,800 uniques, which is still plenty: 10! = 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 3,628,800
Of course, shortening the length of the URL also limits our available unique URL's. With 7! you only get 5040, 8! is 40,320
What do you think? Smart Developer or Useless feature?
Leave your comments with your own ideas of this practice. Do you think it's a good idea? Upon closer inspection and thought, I'm thinking the T and 7 might be confused for each other as well, as they are quite similar looking, but I'm running out of digits for my replacement set!