What’s In A Name

A rose by any other name would require a schema migration.

Amber’s excellent lightning talk on identity yesterday made me feel many feels, and reminded me of this excellent post by Patrick McKenzie about false assumptions regarding names.

While that list is helpful, it’s very light on positively-framed advice, i.e. “you should” rather than “you shouldn’t”. So I feel like I want to give a little bit of specific, prescriptive advice to programmers who might need to deal with names.

First and foremost: stop asking for unnecessary information. If I’m just authenticating to your system to download a comic book, you do not need to know my name. Your payment provider might need a billing address, but you absolutely do not need to store my name.

Okay, okay. I understand that may make your system seem a little impersonal, and you want to be able to greet me, or maybe have a name to show to other users beyond my login ID or email address that has to be unique on the site. Fine. Here’s what a good “name” field looks like:

You don’t need to break my name down into parts. If you just need a way to refer to me, then let me tell you whatever the heck I want. Honorific? Maybe I have more than one; maybe I don’t want you to use any.

And this brings me to “first name / last name”.

In most cases, you should not use these terms. They are oversimplifications of how names work, appropriate only for children in English-speaking countries who might not understand the subtleties involved and only need to know that one name comes before the other.

The terms you’re looking for are given name and surname, or perhaps family name. (“Middle name” might still be an appropriate term because that fills a more specific role.) But by using these more semantically useful terms, you include orders of magnitude more names in your normalization scheme. More importantly, by acknowledging the roles of the different parts of a name, you’ll come to realize that there are many other types of name, such as:

If your application does have a legitimate need to normalize names, for example, to interoperate with third-party databases, or to fulfill some regulatory requirement:

  • When you refer to a user of the system, always allow them to customize how their name is presented. Give them the benefit of the doubt. If you’re concerned about users abusing this display-name system to insult other users, it's understandable that you may need to moderate that a little. But there's no reason to ever moderate or regulate how a user's name is displayed to themselves. You can start to address offensive names by allowing other users to set nicknames for them. Only as a last resort, allow other users to report their name as not-actually-their-name, abusive or rude; if you do that, you have to investigate those reports. Let users affirm other users’ names, too, and verify reports: if someone attracts a million fake troll accounts, but all their friends affirm that their name is correct, you should be able to detect that. Don’t check government IDs in order to do this; they’re not relevant.
  • Allow the user to enter their normalized name as a series of names with classifiers attached to each one. In other words, like this:

  • Keep in mind that spaces are valid in any of these names. Many people have multi-word first names, middle names, or last names, and it can matter how you classify them. For one example that should resonate with readers of this blog, it’s “Guido” “van Rossum”, not “Guido” “Van” “Rossum”. It is definitely not “Guido” “Vanrossum”.

  • So is other punctuation. Even dashes. Even apostrophes. Especially apostrophes, you insensitive clod. Literally ten billion people whose surnames start with “O’” live in Ireland and they do not care about your broken database input security practices.
  • Allow for the user to have multiple names with classifiers attached to each one: “legal name in China”, “stage name”, “name on passport”, “maiden name”, etc. Keep in mind that more than one name for a given person may simultaneously accurate for a certain audience and legally valid. They can even be legally valid in the same context: many people have social security cards, birth certificates, driver’s licenses and passports with different names on them; sometimes due to a clerical error, sometimes due to the way different systems work. If your goal is to match up with those systems, especially more than one of them, you need to account for that possibility.

If you’re a programmer and you’re balking at this complexity: good. Remember that for most systems, sticking with the first option - treating users’ names as totally opaque text - is probably your best bet. You probably don’t need to know the structure of the user’s name for most purposes.