I think there’s a general human tendency to try to control others. It’s something to be deplored and resisted, but it’s always there. We see it in the school prefect or the inept manager, misusing their power, but we also see it in programmers and their unstinting efforts to impose order on a chaotic word. It’s a tragic and ultimately futile errand, but one that seems to be strangely seductive. Join me on a voyage into the mind of such a programmer.
Americans believe, as a matter of faith, that telephone numbers have ten digits. Three for the area code, followed by seven (hyphenated three-four) for the local part. American forms on the internet, therefore, where they require a telephone number, require it to have ten digits. If you’re a really talented programmer, you might allow the user to enter parentheses and hyphens; a deluxe version will even insert those automatically, in case the idiot had some crazy scheme of their own in mind.
This is, you understand, for their own good, to prevent them from Doing It Wrong. We don’t like Wrong; we like Right, and computers are good at keeping things Right.
Perhaps you have the uppity type of user who doesn’t want to give a phone number. Perhaps they enter all ones. Aha! You’ve thought to check for that. So they enter one of those 555 numbers they use on TV. That’ll work. Well, let’s hope that you don’t reject it unduly, because ‘[n]ot all numbers that begin with 555 are fictional’. Better improve that rule. If you’re in the UK, you might like to encode a list of telephone numbers for drama use just to be on the safe side.
The trouble is that knowing that a telephone number has the ‘correct’ number of digits tells you nothing about whether you can dial it. It might not exist. It might be connected to a fax machine (for the benefit of younger readers). It might be someone else’s number. But it gets worse: phone numbers are international, so that ten-digit constraint will reject a valid E.123 international-format telephone number, even though you could dial it. (I encountered this problem on a visit to the US: I had an address, but was roaming on a UK mobile number. I resorted to entering ten zeros and hoping that no one tried to call me.)
If you’re a particularly obsessive programmer, you might use Google’s libphonenumber to check that the number corresponds to some numbering plan somewhere in the world. I hope you keep it up to date, though. What’s the international dialling code for Sevastopol right now? For Edinburgh?
If you try to enumerate all the possible valid phone numbers you’re going to accept, you’d better get it right. It’s embarrassing to reject all sign-ups from London because you thought that UK phone numbers have a maximum of ten digits, when they’re always given with a leading zero that makes it eleven. Not that that would ever happen, of course (cough) and even if it did, you wouldn’t care, because no data is better than Wrong data, isn’t it?
Requiring a specific number of digits doesn’t guarantee that you’ll get a valid phone number, but does guarantee that you’ll exclude some valid phone numbers.
So why do it? Perhaps you’re worried that someone might accidentally enter their email address into the phone number field. (Is your UI that bad?!) If so, you could check for the presence of a digit. If you really want to check that the number is valid and belongs to the person who entered it, however, you’ll have to send an SMS or call the number.
What about email addresses? You could use a really complicated
expression to check that it matches RFC 822. That proves
[email protected] will match, but it’s
not much good for sending your valued brand outreach emails/spam
to. The only way to test that an email is right is to send an email
and to wait for a response.
You decide to ensure that people enter ‘real names’ into the name form. You check that the name is at least three letters long and contains a vowel, eliminating the common Cantonese name of ‘Ng’. You require that the name contains only ‘standard’ letters, preventing Finns from subscribing. But at least you didn’t get any bogus information, right? Right? Besides, everyone knows that every person in the world has a first name and a surname. Well, except Hungarians and Japanese and Indonesians and Tamils and over a billion other people outside your narrow frame of reference.
You impose an arbitrary length limit on address fields because long street names are anathema. (Or maybe just because you suck at database design.) You check every address against a master database that you update a few times a year, rejecting anyone who recently moved into a new house. You require a State/Province even in countries that don’t have those, because yours does. You require a City and County field in England even though not everywhere is in a city (and did you want the nearest town or the Post Town in that case?) and some cities (e.g. London) are not in counties.
To summarise: you can’t guarantee that you’ve got a usable phone number without calling it. You can’t guarantee that you’ve got a usable email address without writing to it. You can’t guarantee that you’ve got a name without asking the person. You can’t tell whether an address works without sending something there.
So what is your form validation for? And is it actually solving a problem or just adding to the hassles of your users and support load when it rejects perfectly reasonable inputs?
I’ve seen a lot of effort go into dealing with false rejections and into improving validation rules to try to reduce those false rejections. In most cases, though, this has been solving a problem that didn’t exist, causing hassle, and taking away attention from what actually matters.
I’m not saying that you shouldn’t check anything, just that the world is not nearly as amenable to codification as some might think, and if you try, you’re going to spend a lot of time battling edge cases. You might need to validate some information some of the time, but it’s equally possible that you’re just making a rod for your own back. So why not save us all a bit of wasted effort?