Geocoding Countries Using Different Naming Conventions

Automating the conversion of a list of countries into geometric locations can be troublesome when different data sources use different naming conventions. There are numerous static lists available on the web with well defined country coordinates (i.e., Google dev country list). The trouble arrises when you need to look up coordinates of countries based on names in another data set that has different naming conventions. In that situation you could identify the name differences and modify them to match. That becomes time consuming for lots of countries (i.e., all countries) and isn’t automatically adaptable to new data sets.

I ran into this issue recently when attempting to automate the process off reading, analysing and visualising country macro trade data. I found that different datasets had different country naming conventions which prevented automation of the geo lookup from a static country list.

To work around that, I wrote a quick python script, geocode.py, that converts a list of country names into geo coordinates without being sensitive to specific naming conventions. The script uses the Google Maps geocoding Api to do a best effort lookup on the a country name for the lat/lon coordinates.

The lookup result processing in the script checks that the results for a given country name have exactly one country result. That single result is used to determine a normalised country name and its geo coordinates.

Lets take a look at ‘New Zealand’ as an example usage.

Here is a small list of some possible names used to identify New Zealand in a data set.

nz
NZ
new zealand
New Zealand
NewZealand
New, Zealand
New,Zealand
Zealand, New
Zealand,New
Zealand New

And using that as an input to geocode.py, I.e.,

python geocode.py -i newzealand.txt -k YOUR_API_KEY_HERE

The following output is generated.

WARNING:root:no country location from for: Zealand, New
WARNING:root:no country location from for: Zealand,New
WARNING:root:no country location from for: Zealand New
nz,New Zealand,-40.900557,174.885971
NZ,New Zealand,-40.900557,174.885971
new zealand,New Zealand,-40.900557,174.885971
New Zealand,New Zealand,-40.900557,174.885971
NewZealand,New Zealand,-40.900557,174.885971
New, Zealand,New Zealand,-40.900557,174.885971
New,Zealand,New Zealand,-40.900557,174.885971

The result shows that all of the names were correctly mapped to coordinates except for the formats with the ‘New’, ‘Zealand’ words in reverse order. While this is not perfect, I have found it good enough for the data sets I have come across. There will be exceptions where data sets use very obscure naming but they can at least be identified and corrected.

.NET4.5 Compression Library Path Handling

In .NET4.5 Microsoft added built-in ZIP support to the compression library (System.IO.Compression).

The paths of entries added to a ZipArchive are not checked for conformance with the PKWARE specification by the library. It is worth mentioning since I didn’t find much about it on the internet.

The PKWARE zip specification states:

4.4.17 file name: (Variable)

4.4.17.1 The name of the file, with optional relative path. The path stored MUST not contain a drive or device letter, or a leading slash. All slashes MUST be forward slashes ‘/’ as opposed to backwards slashes ‘\’ for compatibility with Amiga and UNIX file systems etc. If input came from standard input, there is no file name field.

On windows the default directory separator is the forward slash ‘/’. So it is worth checking and converting paths if necessary before adding a zip entry. Unix zip programs (such as unzip) will show a warning and attempt to convert paths but others may fail.

So when writing zip archive construction code in C# to ensure cross platform compatibility, do something like:

string entryName = "C:\\Foo\\Bar.txt";

if (Path.DirectorySeparatorChar == '\\')
    entryName = entryName.Replace('\\', '/');

The first line ensures that the replacement only happens on systems which use a backslash as the default directory separator character. Note that this assumes that multiple consecutive path characters have already been sanitised.

Installing Hubot on Mac OS X Lion 10.7

Github has just released Hubot to the public (Say Hello to Hubot).

What is Hubot?

I have installed it on my MBP running Max OS X 10.7.2

Instructions:

  • Set Redis to run on startup. Run the following commands from terminal.
    mkdir -p ~/Library/LaunchAgents
    cp /usr/local/Cellar/redis/2.2.12/io.redis.redis-server.plist ~/Library/LaunchAgents/
    launchctl load -w ~/Library/LaunchAgents/io.redis.redis-server.plist

Example usage:

Result in browser: