New Zealand Trade Visualisation

My latest University project was to create a visualisation of choice with another class member.

Stephen and I came across a complete set of data from New Zealand to all other trading countries. I must point out here that NZ does a comparatively good job to the rest of the world in terms of data availability. Some countries that will remain unnamed (*cough* Australia *cough*) provide their data in PDF form ūüôĀ

StatsNZ Exports Datatable

This huge table was just screaming out to be visualised. We took inspiration from Minard’s effective French wine export map as well as from a global trade visual.¬†Both of these visualisations use the line width to indicate the quantity of exports between countries. Lines start off thick and then thin out as it branches off into countries. Our goal became to take this static idea and present it in a dynamic way using HTML5 technologies.

Minard - French Wine Exports Map

By combining StatsNZ data and country locations collected using Google’s geolocation API, we were able to produce a JSON datastructure that could easily be adapted to support the visualisation of other countries. The following diagram breaks down the process:

532exports Application Flow

And the final product!

It can also be viewed live in my testing folder here: Export Visualisation

Special thanks to my group member Stephen and to StatsNZ, and the Google Maps API for their awesome data.

Update: The code is now available on github.

New Zealand Budget 2011 – Where’s My Taxes!

The 2011 New Zealand budget was released today. A brief break down of spending has been posted on news sites such as stuff and NZ Herald.

Headlines read “Budget shows ‘lack of vision’” and “Budget: Thousands affected by cuts” respectively.

While such articles do attempt to give a general break down of Government spending across the sectors, it is far from complete and does not provide the viewer with the opportunity of exploring the data and picking out patterns themselves.

Thankfully, a friend of mine has produced a clean and concise visualisation for doing such task.

Mark’s Blog Post

Where’s My Taxes? Visualisation


Force Directed Emails Visualisation Using arbor.js

I have recently been playing around with ways of visualising my sent emails for an assignment at university. I have over 6 years worth of sent mail sitting in Gmail, collecting electronic dust (and occasionally being dusted off and looked at).

I downloaded all of my sent mail from Gmail (over 3000 conversations), making ¬†use of my university’s internet connection. I made use of thunderbird for this, as it stores emails in flat files on disk. I made a simple python script that takes this raw file and converts it into JSON.

import mailbox
import json

# path to you sent mail mbox
mb = mailbox.mbox('Sent Mail')

fout = file('sent.json', 'w')
items = []
fields = ['date', 'subject', 'to']

for i in range(len(mb)):
    obj = {}
    for item in mb.get_message(i).items():
        if item[0].lower() in fields:
            obj[item[0]] = item[1]

json.dump(items, fout)

A nice javascript vis library called arbor makes it easy to take a graph and apply a force directed algorithm to it. It allows you to only worry about the visualisation side by separating the layout computation from the graphical display. Using this library and some basic javascript processing I was able to produce the following display.

Sent Emails Visualisation

I have obscured the recipient names for the sake of their privacy only (my own privacy has already been destroyed by the likes of such sites as Facebook – see: Facebook is an appalling spying machine).

The line widths between time periods and recipients relates to the number of emails sent to that person. Nodes can be dragged about and added in by dragging the sliders to alter the time range.

The most interesting part of this process was probably making a wordle based on all of the recipient names for all of the sent emails. The wordles changing from year to year showed the people that I was in the most communication with and perhaps most important in a given time in my life.

To leave you with an idea – someone needs to make a web app to anonymize wordles (ie replace words with random, but unique common words of the same length). A google search did not find anything meaningful.

** EDIT **
Lots of people have been asking for the code. I will provide the link to my github project but it will not include the data file.

Tufte Bar Chart Redesign in jQuery Flot

I have been reading (very slowly) through Edward R. Tufte’s ‘The Visual Display of Quantitative Information’ book. On page 126 Tufte proposes an alternative design for the Bar Chart / Histogram using all of his suggested principles.

The design basically attempts to emphasis the data opposed to the graph related lines. You will notice in the image below that there are no:

  1. Axis Ticks – Tufte states that the white grid marks remove the need for these
  2. Grid – The box around the bar graph does not aid understanding

I set out to implement this design using jQuery Flot. Here is the final result and the javascript source code to go with it.

var data, ticks, options, series, plot, ctx, lineWidth, offset;

        // some data close to Tufte's example
        data = [
            [0, 9], [1, 12], [2, 7], [3, 8], [4, 3],
            [5, 18], [6, 14], [7, 9], [8, 6], [9, 11],
            [10, 5], [11, 10]

        ticks = [ 5, 10, 15 ];
        options = {
            series: { bars: { show: true, fillColor: "rgb(128,128,128)" } },
            grid: { show: true, borderWidth: 0, color: "#fff" },
            xaxis: { show: false, min: -0.5 },
            yaxis: {
                tickFormatter: function(number) { return number + "%"; },
                ticks: ticks

        series = [{ data: data, bars: { barWidth: 0.5 }, color: "#fff" }];
        plot = $.plot($("#plot"), series, options);
        ctx = plot.getCanvas().getContext("2d");

        // need to set the label colour to not be white
        $(".tickLabel").css("color", "#000");

        // draw line at baseline because it "looks good"
        lineWidth = 2;
        offset = plot.offset();
        ctx.fillStyle = "rgb(128,128,128)";
        ctx.lineStyle = lineWidth;
        ctx.fillRect(offset.left + lineWidth * 3 , plot.height() + lineWidth,
            plot.width() - lineWidth * 8, lineWidth);

        // draw horizontal lines to remove need for ticks
        $.each(ticks, function(index, tick) {
            var yaxis = plot.getYAxes()[0];

            ctx.fillStyle = "#fff";
            ctx.lineWidth = lineWidth;
            ctx.fillRect(offset.left, yaxis.p2c(tick) + lineWidth / 2,
                plot.width(), lineWidth);

Kd-tree in Javascript

I needed to get my head around how kd-trees work, so I coded up a simple implementation that just builds a tree. Naturally, add, delete, balancing etc methods would be required.

See wikipedia: kd-tree

     * Builds a kd-tree given an array of points
    var kdtree = function(points, depth) {
        var axis, median, node = {};

        if (!points || points.length == 0) return;

        // alternate between the axis
        axis = depth % points[0].length;

        // sort point array
        points.sort((a, b) => a[axis] - b[axis]);

        median = Math.floor(points.length / 2);

        // build and return node
        node.location = points[median];
        node.left = kdtree(
            points.slice(0, median), depth + 1);
        node.right = kdtree(
            points.slice(median + 1), depth + 1);
        return node;

Example usage would be:

var points = [ [2,3], [5,4], [4,7], [8,1], [7,2], [9,6] ];
kdtree(points, 0);

SVG vs Canvas Performance

My honours project this year requires the use of graphical browser technologies to produce a modern network weather map.

I have used SVG and HTML5 canvas in the past and have had mixed performance experiences with them. Over summer I had to make use of SVG to draw a traceroute tree-map which in practice performed a lot quicker running on Google Chrome opposed to Firefox.

I wanted to get a clear picture of the current position that both of these technologies are at in terms of performance for drawing and translating large numbers of nodes.

A quick search around the internet showed a handful of existing tests, but I thought I would give it a go to for the practice.

My testing platforms were:

Windows 7
Intel Core 2 Quad 2.6GHZ
4GB Ram

Mac Book Pro
Mac OS Snow Leopard 10.6.7
Intel i5 2.3 GHz dual core
8GB Ram

I tested the following browsers:

  • Chrome 10.0.648.204
  • Firefox 3.3.16 (I didn’t test FF4.0 but I hear performance is not a lot better)
  • Safari 5.0.4

Test Setup / Method

I wrote a simple script that creates/draws a given number of nodes using either ‘svg’ or ‘canvas’. The nodes are then translated as many times as possible per second.

The framerate is calculated by incrementing a counter ever time all of the nodes have been redraw(canvas)/translated(svg) and then outputting its value after each second via a setInterval function.

function draw() {
    if (diffTime >= 1000) {
        fps = frameCount;
        frameCount = 0.0;
        lastTime = nowTime;

        // draw the 'fps' to the window

    // draw some magical particles


See the full code here:

There are disadvantages/naiveness to this method. I am only testing translation performance of SVG and redrawing performance of canvas. Event support in SVG is a given and is quick due to DOM nodes existing for each node that is drawn. Canvas requires an event system to be built around the application to get the same event support.

But, nonetheless it gives a good overview of where browsers are at.


I took a serious of measurements at various different numbers of particles and recorded the framerate at that value. This produced a big table of numbers but I will just summaries those in 2 pretty graphs.

These results show Chrome as a clear winner for bothCanvas and SVG performance on my windows 7 pc. The SVG performance on both pc and mac degrades significantly as the number of nodes increase across all of the browsers I tested. Chrome and Safari maintained a somewhat usable performance for SVG around the 1000 node mark where as Firefox began locking up.

Another interesting point is Safaris apparent 90 frames / second limit for canvas. While it did not go over 90 frames /second on my mac, it remained mostly constant over the 2000 particles tested.

From these results it appears that reasonable SVG performance is currently possible, but if you are needing thousands of nodes in your visualization, I would stick with Canvas and implement a basic event system.

** UPDATE **

It was requested that I show results for IE 9 due to the addition of hardware rendering support. I actually did these tests before handing in my final project so it is easy to include the new graph here.

The graph shows that IE 9 canvas rendering speed outperforms all other browsers. IE 9 SVG performance still shows the same rapid decline as the number of particles increase.

Trademe & Christchurch Wordle

I wanted to give the trademe API a go. With the recent Christchurch earthquake happening, I thought – why not make a wordle of the the Christchurch related title words?

I was rather impressed with the simplicty of using the API as a guest. You are limited to 50 requests per IP address, per hour. This was enough for me as I only searched through 40 pages to extract title key words.

Setting up an application looked a lot less straight forward, but would be required to dramatically increase this limit.

Here is the code I used to print out all of the titles to the browser (php happened to be most available at the time):

List as $auction) {
    echo $auction->Title . " ";

Then is was simply a matter of copying and pasting the output into wordle’s create page. I removed christchurch from the word list so that it did not blow the word cloud out of proportions.

Here it is.

Summer sun – PHP gripes

Summer is almost here. Sunny skies, warm beaches, walks to the bakery for lunch.

Nothing to complain about there.

However, there is something to complain about in php. I was trying to track down a bug in my php scripts at work today and I came across another php pitfall for the weary programmer.

I think my svn commit change log will best explain what tripped me up.

The red line shows the bug in my old code, where I forgot a dollar sign and misspelled the variable ‘$direction’.

This was not producing a syntax error but rather the ‘direcion’ was being taken as a string and multiplied by -1 in some way, shape or form.

A small amount of googling came up with someone who also experienced a similar problem:

Network Map Display Project

To be awarded my degree with honours I have to complete an honours project which is worth the majority of my years work. The way it works is, lecturers write blurbs about projects that they would like to supervise and then the students browse through and pick the one that they like.

I have settled upon a project titled “Improved Network Map Display”. It is going to involve research around the area of network topology data consolidation and visualisation.

I have had a (short) look around the web and as far as I can tell, there has not been anything like this done in its entirety. Visualisations exist (most of which are static) that allow certain individual levels of network topology to be visualised. The purpose of this project is to improve upon these levels and to allow for seamless transitions between them as you zoom in and out. This will require interviews with various different network engineers and users, and also a well thought out topology schema that will fit the purpose of complete consolidation.

Here is the ‘Karen’ weathermap that shows the data traffic flows over the network. See the live version:

This is a good example of how an autonomous system view might look.

The ability to zoom in here, assumes that the¬†autonomous¬†systems are willing to share network topology information, which I don’t think is a realistic assumption to make.

ADSL Cap Delay

Up until a few months ago, we only had 20GB bandwidth per month at my flat. This meant that we could not make heavy use of the internet during the duration of the month.

We are on a plan that caps the internet speed down 64kbit/s when you exceed the monthly limit. However there seems to be an approximately 24 hour delay between going over the bandwidth limit, and getting capped.

Lets do a little math.

  • time = 24 hours = 24 x 60 x 60 seconds
  • internet speed = 800 kb/s

24 x 60 x 60 x 800 = 69120000 kb = 65.9GB

Thats right, I can potentially more than tripple my bandwidth allowance by saving up all the large downloads until the last day of the billing month.

I tested out this theory a couple months ago and managed to download 36GB in one day. The only reason this stopped was because I ran out of things to download and I was not at home at the time ūüôā

ADSL usage screenshot