Force Directed Emails Visualisation Using arbor.js

I have recently been playing around with ways of visualising my sent emails for an assignment at university. I have over 6 years worth of sent mail sitting in Gmail, collecting electronic dust (and occasionally being dusted off and looked at).

I downloaded all of my sent mail from Gmail (over 3000 conversations), making  use of my university’s internet connection. I made use of thunderbird for this, as it stores emails in flat files on disk. I made a simple python script that takes this raw file and converts it into JSON.

import mailbox
import json

# path to you sent mail mbox
mb = mailbox.mbox('Sent Mail')

fout = file('sent.json', 'w')
items = []
fields = ['date', 'subject', 'to']

for i in range(len(mb)):
    obj = {}
    for item in mb.get_message(i).items():
        if item[0].lower() in fields:
            obj[item[0]] = item[1]
        mb.remove(i)
    items.append(obj)

json.dump(items, fout)
fout.close()

A nice javascript vis library called arbor makes it easy to take a graph and apply a force directed algorithm to it. It allows you to only worry about the visualisation side by separating the layout computation from the graphical display. Using this library and some basic javascript processing I was able to produce the following display.

Sent Emails Visualisation

I have obscured the recipient names for the sake of their privacy only (my own privacy has already been destroyed by the likes of such sites as Facebook – see: Facebook is an appalling spying machine).

The line widths between time periods and recipients relates to the number of emails sent to that person. Nodes can be dragged about and added in by dragging the sliders to alter the time range.

The most interesting part of this process was probably making a wordle based on all of the recipient names for all of the sent emails. The wordles changing from year to year showed the people that I was in the most communication with and perhaps most important in a given time in my life.

To leave you with an idea – someone needs to make a web app to anonymize wordles (ie replace words with random, but unique common words of the same length). A google search did not find anything meaningful.

** EDIT **
Lots of people have been asking for the code. I will provide the link to my github project but it will not include the data file. https://github.com/oughton/email-wordle/

SVG vs Canvas Performance

My honours project this year requires the use of graphical browser technologies to produce a modern network weather map.

I have used SVG and HTML5 canvas in the past and have had mixed performance experiences with them. Over summer I had to make use of SVG to draw a traceroute tree-map which in practice performed a lot quicker running on Google Chrome opposed to Firefox.

I wanted to get a clear picture of the current position that both of these technologies are at in terms of performance for drawing and translating large numbers of nodes.

A quick search around the internet showed a handful of existing tests, but I thought I would give it a go to for the practice.

My testing platforms were:

PC
Windows 7
Intel Core 2 Quad 2.6GHZ
4GB Ram

Mac Book Pro
Mac OS Snow Leopard 10.6.7
Intel i5 2.3 GHz dual core
8GB Ram

I tested the following browsers:

  • Chrome 10.0.648.204
  • Firefox 3.3.16 (I didn’t test FF4.0 but I hear performance is not a lot better)
  • Safari 5.0.4

Test Setup / Method

I wrote a simple script that creates/draws a given number of nodes using either ‘svg’ or ‘canvas’. The nodes are then translated as many times as possible per second.

The framerate is calculated by incrementing a counter ever time all of the nodes have been redraw(canvas)/translated(svg) and then outputting its value after each second via a setInterval function.

function draw() {
    if (diffTime >= 1000) {
        fps = frameCount;
        frameCount = 0.0;
        lastTime = nowTime;

        // draw the 'fps' to the window
    }

    // draw some magical particles

    frameCount++;
}

See the full code here: https://github.com/oughton/Drawing-Browser-Benchmark

There are disadvantages/naiveness to this method. I am only testing translation performance of SVG and redrawing performance of canvas. Event support in SVG is a given and is quick due to DOM nodes existing for each node that is drawn. Canvas requires an event system to be built around the application to get the same event support.

But, nonetheless it gives a good overview of where browsers are at.

Results

I took a serious of measurements at various different numbers of particles and recorded the framerate at that value. This produced a big table of numbers but I will just summaries those in 2 pretty graphs.

These results show Chrome as a clear winner for bothCanvas and SVG performance on my windows 7 pc. The SVG performance on both pc and mac degrades significantly as the number of nodes increase across all of the browsers I tested. Chrome and Safari maintained a somewhat usable performance for SVG around the 1000 node mark where as Firefox began locking up.

Another interesting point is Safaris apparent 90 frames / second limit for canvas. While it did not go over 90 frames /second on my mac, it remained mostly constant over the 2000 particles tested.

From these results it appears that reasonable SVG performance is currently possible, but if you are needing thousands of nodes in your visualization, I would stick with Canvas and implement a basic event system.

** UPDATE **

It was requested that I show results for IE 9 due to the addition of hardware rendering support. I actually did these tests before handing in my final project so it is easy to include the new graph here.

The graph shows that IE 9 canvas rendering speed outperforms all other browsers. IE 9 SVG performance still shows the same rapid decline as the number of particles increase.