Geocoding Countries Using Different Naming Conventions

Automating the conversion of a list of countries into geometric locations can be troublesome when different data sources use different naming conventions. There are numerous static lists available on the web with well defined country coordinates (i.e., Google dev country list). The trouble arrises when you need to look up coordinates of countries based on names in another data set that has different naming conventions. In that situation you could identify the name differences and modify them to match. That becomes time consuming for lots of countries (i.e., all countries) and isn’t automatically adaptable to new data sets.

I ran into this issue recently when attempting to automate the process off reading, analysing and visualising country macro trade data. I found that different datasets had different country naming conventions which prevented automation of the geo lookup from a static country list.

To work around that, I wrote a quick python script, geocode.py, that converts a list of country names into geo coordinates without being sensitive to specific naming conventions. The script uses the Google Maps geocoding Api to do a best effort lookup on the a country name for the lat/lon coordinates.

The lookup result processing in the script checks that the results for a given country name have exactly one country result. That single result is used to determine a normalised country name and its geo coordinates.

Lets take a look at ‘New Zealand’ as an example usage.

Here is a small list of some possible names used to identify New Zealand in a data set.

nz
NZ
new zealand
New Zealand
NewZealand
New, Zealand
New,Zealand
Zealand, New
Zealand,New
Zealand New

And using that as an input to geocode.py, I.e.,

python geocode.py -i newzealand.txt -k YOUR_API_KEY_HERE

The following output is generated.

WARNING:root:no country location from for: Zealand, New
WARNING:root:no country location from for: Zealand,New
WARNING:root:no country location from for: Zealand New
nz,New Zealand,-40.900557,174.885971
NZ,New Zealand,-40.900557,174.885971
new zealand,New Zealand,-40.900557,174.885971
New Zealand,New Zealand,-40.900557,174.885971
NewZealand,New Zealand,-40.900557,174.885971
New, Zealand,New Zealand,-40.900557,174.885971
New,Zealand,New Zealand,-40.900557,174.885971

The result shows that all of the names were correctly mapped to coordinates except for the formats with the ‘New’, ‘Zealand’ words in reverse order. While this is not perfect, I have found it good enough for the data sets I have come across. There will be exceptions where data sets use very obscure naming but they can at least be identified and corrected.

Published by

oughton

A software / web applications developer, living in Hamilton and currently studying at Waikato University.

Leave a Reply

Your email address will not be published. Required fields are marked *