61

Given the task of storing international geographic addresses in a relational table, what is the most flexible schema? Should every part of the address be broken out into their own fields, or should it be more like free text?

Is there any sense in separating differently formatted address into different tables? For example, have a table for USAAddress, CanadianAddress, UKAddress...?

Juha Syrjälä
  • 30,987
  • 31
  • 122
  • 175
Bob
  • 90,304
  • 29
  • 114
  • 125

9 Answers9

105

I will summarize my thoughts from my blog post - A lesson in address storage (on archive.org).

On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:

  • Street Number - Numeric
  • House or Building Name - [VarChar - in the UK some houses/buildings are identified by name, not by number]
  • Street Number Suffix [VarChar, although in most cases, Char(1) would suffice]
    • A, B etc
  • Street Name [VarChar]
  • Street Type [VarChar or Int if you have a StreetTypes table]
    • So far, I've found 262 unique types in the English speaking world, there are likely more, and don't forget other languages i.e. Strasse, Rue etc.
  • Street Direction [VarChar(2)]
    • N, E, S, W, NE, SE, NW, SW
  • Address Type [VarChar or Int if you have an AddressTypes table]
    • PO Box
    • Apartment
    • Building
    • Floor
    • Office
    • Suite
    • etc...
  • Address Type Identifier [VarChar]
    • i.e. Box Number, Apartment Number, Floor Number remember apartment numbers and offices sometimes have alphanumeric info - like 1A
  • Local Municipality [VarChar or Int if you have a Municipalities table]
    • For instance, if your hamlet/village appears in the address before the town.
  • City/Town [VarChar or Int if you have a Cities table]
  • Governing District [VarChar or Int if you have a Districts table]
    • State (U.S.)
    • Province (Canada)
    • Federal District (Mexico)
    • County (U.K.)
    • etc...
  • Postal Area [VarChar]
    • Zip (U.S.)
    • Postal Code (Canada, Mexico)
    • Postcode (U.K.)
  • Country [VarChar or Int if you have a Countries table]

This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats

For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.

In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.

I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.

QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.

One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!

Community
  • 1
  • 1
BenAlabaster
  • 36,299
  • 19
  • 100
  • 147
  • 1
    262 types of streets? Might I ask how you obtained this information? – Thomas Owens Jul 21 '09 at 17:23
  • 2
    @Thomas - Lots and lots of searching and list keeping. Australia, UK, Ireland, Canada, USA, Channel Islands, France. It was an arduous task without getting each country's postal database. – BenAlabaster Jul 21 '09 at 17:33
  • 1
    @Thomas - Don't forget that in the English parts of the world, we often steal names from other countries - i.e. the US uses Spanish names in a lot of places and Canada uses French too. – BenAlabaster Jul 21 '09 at 17:44
  • 6
    How do you handle requesting this information from users? Do you really have this many fields for them to read and decipher? I understand the need for it, but there is also a need for keeping data entry extremely simple for users, especially when tricky forms can lose you a sale or a new member. – Syntax Error Jun 30 '10 at 19:21
  • @Syntax Error: You've got a few choices, provide separate fields requiring the user move between them, or you parse the data and store it in the correct fields on the way in, or you parse it on the way out when you're reporting on it. Depending on how the system is used determines the most appropriate method. If it's heavy on reporting, parse at entry time, if it's heavy on entry, and light on reporting, parse at report time. – BenAlabaster Jul 02 '10 at 16:06
  • 6
    @DeveloperChris The title of the question asks "How should *international* geographical addresses be stored in a relational database", so your comment is largely irrelevant in this context. – BenAlabaster Nov 29 '12 at 14:39
  • Sorry I should not have dropped the in the line about local addresses. This topic is viewed by people wanting direction on both local and international. Bottom Line. Too complex! especially for international addresses. most developers cannot spend the huge amount of time involved to get it wrong in the end as Ruben points out. I wish there was a one pill fixes all but there isn't. simple is better. with strong search functions and possibly operator intervention when a potential for errors crop up. – DeveloperChris Dec 10 '12 at 05:33
  • @BenAlabaster: is there any chance of being able to read your blogpost?. I have tried to access the link you gave, but access is denied. Thanks. – Homunculus Reticulli Jan 25 '14 at 15:47
  • 3
    @DeveloperChris `This topic is viewed by people wanting direction on both local and international. Bottom Line.` Then they are looking at the wrong question. Bottom line. – rybo111 Feb 21 '14 at 14:17
  • @HomunculusReticulli Apologies, this has been fixed – BenAlabaster Mar 10 '14 at 14:01
  • 1
    How long should each `varchar` field be? – Nate Jan 28 '15 at 16:30
  • I would disagree with breaking out all the tiny pieces of a street name. If someone lives on South Roberts Roads then there is no real benefit in breaking out "South", "Roberts", and "Road". The relationship between South Robert Roads, North Roberts Road, South Roberts Street, and North Roberts Street is inconsequential. For all practical purposes (in my experience) they are as good as four separately named roads. – David Baucum Apr 01 '20 at 17:40
26

Be careful not to over-analyze address formats. When you do, you're quite likely to end up with a specification most users will need to work around, effectively forcing them to use the wrong fields, or only filling the primary fields and ignoring the extra fields.

Keep things simple.

A StreetType like mentioned by BenAlabaster will cause problems when you start working with languages different from isolating languages like English or Spanish.

To show you how bad things can get in the wild: the "Henriette Roland Holststraat" in Amsterdam, built up from "Henriette" + "Roland Holst" + "straat", which can be abbreviated as the "Roland Holststraat", or "Roland Holststr.", or misspelled as "H.R.Holststr." or "Henriette Roland-Holst straat", depending on the weather. Unless you've got an up-to-date street register for each country on earth, you'll be going nowhere.

And finally, be careful that in some multilingual countries, names can be different from one language to another! For instance in Brussels where many streets have both a French and a Dutch name: "Avenu du Port" and "Havenlaan", depending on the addressee's preferred language. (Google Maps shows both names alternately, just to be on the safe side.)

You can try to devise all kinds of clever tricks here, but are the sales reps. going to understand this?

Ruben
  • 14,252
  • 2
  • 32
  • 45
  • 2
    You do raise a good point that I hadn't covered in my response. This is certainly something that should be considered when accounting for Dutch, German and other non-isolating languages. – BenAlabaster Jul 02 '10 at 16:29
8

That depends on what you want to do with it.

I've found it always easier to use addresses for other purposes (such as verification against USPS data or getting shipping rates from UPS/FEDEX) if they're separated.

Here's what I typically use for addresses:

  • Address Line 1
  • Address Line 2
  • Address Line 3
  • City
  • Region
  • Postal Code
  • County
  • Country

In Response to the edit: For most situations I don't see the use. The table I listed above has enough fields (and is generic enough) for most country's addresses.

Stephen Wrighton
  • 33,316
  • 6
  • 63
  • 84
  • 1
    Address Line 1, 2 and 3 are definitely generic enough but when it comes to parsing addresses programmatically, you're going to be hosed. Parsing addresses programmatically is not a trivial task when considering international address formats. – BenAlabaster Jul 21 '09 at 15:49
  • 2
    @Alix Axel - and for those countries, leave the field blank – Stephen Wrighton Oct 10 '11 at 13:47
7

Address

As a polar opposite to the excellent answer @BenAlabaster has provided, you could simply have:

address       TEXT(300)
postal_code   VARCHAR(15)
country_code  VARCHAR(2)

Your client-side form layouts can still be as complex as you see fit (or use a multi-line input where the user can manually type their address). You can then add the line breaks in the address where necessary.

Country

Your country table would look as follows:

country_code  VARCHAR(2)
country_name  VARCHAR(255)

Additionally, you could have one of the following:

postal_code_required  TINYINT(1)
postal_code_regex     VARCHAR(255) NULL DEFAULT NULL

Then use the following lists to design your country table:

Community
  • 1
  • 1
rybo111
  • 10,836
  • 3
  • 55
  • 62
7

Here's an anecdote for anyone who stumbles on this question:

I speak as a person who has lived and worked on a lot of continents (Europe, Asia, North America). In my experience, and the experience of the people I work with, it has been much easier for us to use systems that do the following:

  1. Provide three lines into which I will type one address. Pass these three lines on to your local postal service as I type them, verbatim. Let me use any character set I want; use UTF-8 or something better.
  2. If your system has business requirements that need me to specify particular information ( such as zip code, prefecture, state, etc. ), ask for that separately. By business requirements, I mean things like analytics; these bits of information should not be shared with your local postal service ( unless I also happened to write the same information into one of the three lines from Point 1, above ).
  3. Have a dropdown that asks me to specify the categorical location of address I provided in the lines of Point 1 above, perhaps Country.
  4. If you must parse the information I provide in the lines of Point 1, use my answer to Point 3 to select regex. Run that regex against the information in Point 1 to parse it. Try to fill the user interface elements of Point 2 using the output from your regex. If I correct that autofilled information--use the fact that I changed it to improve your regex. Similarly, as much as possible, give me an opportunity to review and correct the output of your regex: nobody knows better what I intended to communicate than me.

Systems built like this, I find, make my life easiest. Particularly when I'm sending mail to a postal system about which your firm has virtually no functional internal knowledge.

If your firm does have internal knowledge about particular postal systems, use my selection in Point 3 to inform which view you display to me. A lot of people know what the US postal system expects on packaging; if I select US in Point 3, feel free to make the view look appropriate for a US address. If I select a country about which your firm knows nothing--display a generic three lines and let me do the rest; don't force me to use ASCII.

And let's be real here--building a complete, encyclopedic database of all global postal systems ( public and private ) is a herculean task at best, if not an impossible one. There are, for example, postal systems in which only the local, last-mile carrier really knows where an address is located. Sometimes being able to pass notes to that carrier on the packaging is extremely useful. And mapping the local knowledge of every edge case carrier into your database is indeed an impossible task.

Just ask Gödel. ( And then ask yourself if you're attempting to use an axiomatic system to model a universe of discourse, give or take some sort of arithmetic like set theory or relational algebra. )

StudentsTea
  • 270
  • 4
  • 14
2

Comment of Ben Alabaster's Answer: To format addresses based on country, you could use a formatting table that has the ordering of the columns for each country as separate rows.

  • AddressFormat(CountryCode, FieldName, FieldOrder)

The field order can be coded to use complex grid layouts also.

There is no point in separating addresses by country. This will be chaotic as the number of countries increases and you will land in trouble if you want to find all the addresses of say, an international client. Having an Address Type suggested by Ben could also lead to ambiguities when you have an address that has both a building number and an apartment number. I could be in an apartment complex where each building has a different name. This is very common in India.

bkm
  • 911
  • 7
  • 13
1

I use https://github.com/commerceguys/addressing library to format international addresses and they use these elements:

Country
Administrative area
Locality (City)
Dependent Locality (in: BR, CN, IR, MY, MX, NZ, PH, KR, ZA, TH)
Postal code
Sorting code
Address line 1
Address line 2
Organization
Recipient

This doen't help if you want to parse the street (name, house number, ...).

Btw. if you are looking for a multilanguage country list: https://github.com/umpirsky/country-list

Harald Ernst
  • 346
  • 4
  • 9
0

The only way is to split them to:

Name varchar,
Title varchar,
StreetAddress varchar,
StreetAddressLine2 varchar,
zipCode varchar,
City varchar,
Province varchar,
Country lookup

since almost every country has it's own standard for having address data, and evey country has a different format of zipcodes.
You can have a small sample of problems in my post from a similiar question.

This should not make sense to separate addresses for every country, since there are countries where you have few address conventions. Some popular conventions include not having streets in small villages, only village name and number, while streets are in larger cities’ addresses. I have learned that in Hungary’s capital – Budapest, there are few streets having the same name (you distinct them by city’s district number), while other cities does not have such addresses (someone from Hungary may actually confirm if this is true). So the total number of address formats will be numer_of_countries multiplied by number of address formats in this country… Can be done with different tables, but it will be horrible work to do.

Community
  • 1
  • 1
smok1
  • 2,928
  • 24
  • 35
  • How come you've used Province but ZipCode? Also, the StreetAddress and StreetAddressLine2 are generic enough for display purposes but if you've gotta do EDI or parse the addresses programmatically for QA (or for any other reason) you're going to be up a gum tree. – BenAlabaster Jul 21 '09 at 15:50
  • It simply depends what you need those data for. For sending post to customers worldwide my solution will be ok. For worldwide EDI you will probably need something like your answer to this question. However, for navigation purposes you will need additional data structures containing GIS data and links between them (so you will know that adress1 is located in the same place as adress2, even though that have got different street name, etc). So it is hard to say which solution is ok (not too complicated and accurate enough) without knowing the context. – smok1 Jul 21 '09 at 17:39
0

I know this is an extremely old topic that is already answered, but I thought that I'd throw my two cents in as well. It all depends on what your project goals and how you expect your target users to enter addresses. Ben's suggestion will allow you to parse addresses accurately, but on the other hand could make for a longer (and possibly more frustrating) user data entry process. Stephen Wrighton's suggestion is simpler, and could be easier for users to enter addresses as a result.

I've also seen some models that simply had an "Address" column that would capture a typical street number, type, street name, unit / apartment number, etc. all in one column, while keeping City, Country, Region, etc. within other columns. Similar to Stephen's model, except Address1, Address2, and Address3 all consolidated into one column.

My opinion is that the most flexible models tend to be those that are least restrictive, depending on your interpretation of flexible.

Shan Plourde
  • 8,128
  • 2
  • 25
  • 40