Chromecast media world

Chromecast in the real world: six casting workflows

For such a simple device, Google’s Chromecast has created a surprisingly complex network of technology at my place.

Google says that Chromecast works roughly like this:

Chromecast Google view

Actually it’s more complicated than that. My setup is about as simple as you can get (no NAS, no existing media servers, no Netflix or Hulu or Foxtel or anything), and it looks like this:


Chromecast in the real world

Chromecast in the real world. Not so simple, really.


Six casting workflows

That is, depending on what exactly I want to watch and how, I have to choose between 6 different workflows:

  1. YouTube? Just go to the YouTube website in Chrome, and click the Chromecast button in the video window. This works really well. Great for music playlists, too.
  2. iView or SBS? Go to the site, and use the Google Cast extension to “TabCast”. This works so-so. It’s great for randomly showing something funny you found on the web, though.
  3. Movies you’ve downloaded? Use the VideoStream Chrome app to load it directly off disk. This works perfectly.
  4. Movies in your Plex library? Use the Plex web interface. For some reason you have to go through, and the whole experience is a bit complicated. There are some issues with transcoding that I don’t really understand.
  5. Vimeo? Plex to the rescue. Add Vimeo as a channel (a slightly complicated procedure to view your own uploads).
  6. Want to watch something without using your computer? There’s only a couple of “Google Cast ready Android apps” (YouTube is the only one that works well for me), or use BubbleUPnP to access your Plex library.

And I haven’t even mentioned a couple more complications:

My advice? Figure out the smallest number of workflows to do everything you want to do, and get rid of any extraneous apps, servers, websites etc.

Google’s world

Google’s world centres around casting stuff from your phone. If that was all you could do, the Chromecast would suck. There are few apps, a lot of them are very niche (eg, anime or baseball), junk (like this) or just don’t really work (like the Red Bull app, which drops out every few minutes).

Fortunately, third party tools like VideoStream and Plex fill in a lot of the gaps.

But does it work?

The end result is actually great. Compared to having to plug my laptop into the TV, these things are now easy and fun:

  • Put on some background music: go to YouTube, Pandora or GrooveShark, and cast. No more hooking up audio cables.
  • Show a silly video to my partner. Even from the other room. Stuff I previously wouldn’t have bothered with, but it’s so easy – the TV even turns on by itself.
  • Keep watching a video while doing something else. Easy to leave my study, keep watching the same thing while making coffee or something.
  • Show photos: Just go to Google Plus or Flickr, and cast.

Web map projections: the bare minimum you need to know

TileMill wants to know: what projection is this data?

TileMill wants to know: what projection is this data?

If you’re making maps, you will probably need to know something about cartographic projections. Here’s the minimum.

  1. The globe is round, maps are flat. Each of the hundreds of different methods for converting from round to flat is a projection.
  2. When you have a latitude and longitude, you have unprojected coordinates. Anything you can do with these doesn’t require choosing a projection.
  3. Most consumer web maps use the Web Mercator projection, also known as the Google Web Map de facto standard, EPSG:900913 (“google” written with numbers), EPSG:3857, etc.
  4. Government agencies, desktop apps and other stuff often use the WGS84 projection, also known as EPSG:4326.
  5. It is technically straightforward to convert from unprojected coordinates to any projection, or between projections, using GIS packages or command line tools like GDAL. It can be slow to do this on the fly.
  6. Each projection is defined using a Spatial Reference System. An SRS can also define systems of unprojected coordinates, and even other planets.
  7. There are half a dozen common formats for describing the SRS, including:
    1. SRID, an identifier including the identifier scheme, like “EPSG:3857″, “ESRI:102113″ or “SR-ORG:7483″.
    2. proj4, a short piece of text with lots of + and =, used by a tools like GDAL and TileMill. It looks like:
      +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs 
    3. Well-known text (WKT), a verbose format that can also be used to define spatial data. For example:
  8. The tool you are working with (eg, TileMill) will only support certain projections. You need to:
    1. Find data that is in the right projection (Web Mercator is the safest), or convert it; and
    2. Tell the tool what projection it’s in, if it can’t guess. You will have to pick from a list, or use one of the formats above, that it supports.

Multivariate binary symbol maps with TileMill.

I help researchers make maps of their research. An archaeologist recently wanted to visualise the distribution of some iron-age artefacts around the Levant, based on a spreadsheet of thousands of rows. Each row represents one kind of artefact at a given site, such as “3 incised bangles, subtype I.b.iv, at Gath.” What are these maps called? I’ll go with “multivariate binary symbol map”.

It sounded like a job for CartoDB, but as the requirements unfolded, she wanted pretty specific cartography, plus a custom base map of rivers, historical boundaries etc. So we used TileMill instead, although we didn’t end up getting all that done.


This is where we got to. Each symbol next to a place name represents the presence of a specific type of artefact. ‘Eitun has pins of Type 1 with “incised decorations”, Far’ah has pins of Type 1 with “incised decorations”, “plain decorations” and “ribbed/grooved decorations”.

The most complex of these maps has 6 different attributes:


Loading the data

With a clearer understanding of exactly what we were trying to achieve, I probably would have done something simpler to calculate each of these attributes, such as using Excel. Instead, I loaded the data into PostGIS and wrote some queries. TileMill supports CSV files directly, but unlike CartoDB, doesn’t load the data into a database, so you can’t run SQL queries.

This post from “The World is a Village” explains how to load CSV into PostGIS, but in summary:



The most interesting line is:

update artefacts set geom = ST_SetSRID(ST_MakePoint(lon,lat),4326);

That’s what converts the raw lon and lat columns into a geometry column so that TileMill can plot it.


To determine “are there any artefacts of type X in location Y”, an easy way is to write a view. Each column is a different subquery, for a different X.


That gives data like this:



So, in TileMill we can now use a filter like [subtype_1a>0] to decide whether to place a symbol.


Because there were so many maps to produce (5 of this type, plus another 11), I created them all in one project, each as a single layer.



The #map1 to #map12 layers refer to a different set of data. Each layer pulls in the same spreadsheet, and styles it identically, with the only difference being a single filter.


That turned out to work really well.

But back to the main problem of showing symbols for attributes. It’s easy to show a single symbol if an attribute is present (like a coffee icon if a site is a cafe). But how do you show 4 symbols simultaneously, without them overlapping?

I thought of two approaches.

Symbol approach 1: Fonts

It’s theoretically possible to construct a text string, with an appropriate font. The string could look like “A Q Z”, where A gets rendered as a square, Q as a circle and Z as a star. Unfortunately I couldn’t make it work. I just couldn’t find an open truetype font that would behave like this. I tried loading various WingDings fonts, but always got little boxes instead of symbols.

There are projects like Map Icons or Font Awesome which sort of do this, but using web technologies that aren’t compatible with TileMill. The only proof of concept I achieved was using punctuation.


Using fonts makes it very easy to space icons appropriately:


Using punctuation in this way just doesn’t look good.

Symbol approach 2: marker icons

So the second approach is using traditional markers, and finding a way to position them appropriately. In CartoCSS, there’s no “marker-dx” to offset a marker, but there is “marker-transform“. So you can use SVG transforms, such as translate().


That positions your marker 10 pixels right, and 5 pixels up.




Each different symbol has to be given its own layer (::square, ::circle…), and a different translation offset: (10, -5), (10, 5), (20, -5) etc.

This guarantees that they don’t collide, and mostly looks good:


although it inevitably leads to odd positioning:



With enough time, you could some write some fancy SQL that would stack symbols from the left, avoiding any gaps.

Other TileMill styling

The only other styling of note is that the text labels should appear right-justified, to the left of the exact position. The CartoCSS designation for this is text-horizontal-alignment: left.


You can see the full TileMill project on Github.


The Australian’s menacing editorial

An editorial published in The Australian on the 21st of March set a new low standard in writing about conflicts between cyclists and cars. Prompted by video of a cyclist colliding with a taxi door, the editorial combined a strong anti-cyclist viewpoint (as it’s entitled to do in the opinion section) with some astonishing ignorance and lousy argumentation.

It’s so terrible, I’ve commented on each sentence. (Even the grammar is bad: “The problem of city cyclists reached their apogee…”, “clogging-up lanes”)

The Australian says…


The  arrogant sense of entitlement in our inner cities is also evident in the ever-growing number of cyclists snaking their way through pedestrians on overcrowded pathways, darting between cars and clogging-up lanes on our congested roadways.

Cyclists are entitled to ride on roads. Just ask our Police Chief Commissioner, Ken Lay.

Cyclists don’t “clog-up” roads. If anything, the opposite is true, since each takes up less room than a car.

The problem of city cyclists reached their apogee in Melbourne this week when a cyclist was “doored” on busy Collins Street, after a passenger opened a taxi door and a rider crashed into it.

(Nothing factually wrong here, although the “problem” described is obviously subjective.)

Neither the taxi nor its passenger could be deemed at fault because a narrow “bike lane” inhibited the taxi from stopping next to the kerb.


1. The passenger is clearly committing the offence of causing a hazard to a cyclist by opening a door.

2. This stretch of road is a no-stopping area: the taxi could not have stopped anyway.

3. Cars are allowed to stop in bike lanes.

4. Even if cars weren’t allowed to stop in bike lanes, the suggestion that this would excuse the opening of a door into the path of a cyclist is outrageous.

The passenger was lucky to avoid serious injury.

The risk to the passenger in this case is much lower than the risk to the cyclist, as the collision risk is in the moment immediately following the door being opened – before the passenger gets out. The suggestion here is absurd.

What makes this incident even more absurd is that, although the lane was marked by a bicycle symbol, it was not actually a dedicated bicycle lane.

Whether or not the cyclist was in a bike lane is irrelevant to the offence committed. I can’t fathom what “absurdity” is created by the painted bike lane not being an actual bike lane.

Melbourne bike lanes must have signage, fixed to a pole, that shows the start and finish of a lane, as well as clear markings on the road itself.

This may be true, but not relevant.

The state’s bicycle operations officer — yes, there is such a position — admits there is confusion for cyclists, pedestrians and motorists.

This is possibly a reference to this interview in the Age on March 20. This statement doesn’t seem relevant, other than to imply that the cyclist is somehow at fault for being doored, due to being “confused”. (Why is it surprising that there is a police officer dedicated to cyclists? There are whole sections devoted to motorists)

Cyclists, including the one “doored” this week, are using cameras to film such incidents so they can make insurance claims.

Very few cyclists use such cameras, which is why this incident is getting so much attention. There is an unpleasant (possibly unintended) implication here that users of such video cameras are somehow actively seeking such incidents.

The Victorian government imposed even tougher on-the-spot fines in 2012 for people who opened car doors in the direct path of cyclists.

True. (As far as I know.)

For too long, authorities have bowed to the demands of selfish cyclists and their lobby groups.

This hyperbolic statement doesn’t seem well supported by facts. The equivalent statement for motorists is much better supported.

Truth is, our cities are dominated by cars because they are sprawling.

Certainly true in outer suburbs that lack good public transport, but irrelevant when discussing an incident in the CBD.

We have no equivalent of Amsterdam and should stop pretending we do.

Australia has no equivalent of Amsterdam? Or Melbourne is no Amsterdam? If the implication is that cycling is fundamentally incompatible with Melbourne’s geography, then this is demonstrably incorrect. Currently about 15% of commuters to the CBD each day travel by bike. This is not a fringe activity, by any stretch.

Cycletouring and OpenStreetMapping: a beautiful symbiosis

Contributing to OpenStreetMap is diversely rewarding: you help other people, you make open data as a whole more viable, you learn a lot about the area you’re mapping, and it’s fun. But sometimes it’s just plain pragmatic. Last weekend, I organised a cycle tour from Bendigo to Avenel, via the O’Keefe Rail Trail, Lake Eppalock, Colbinabbin, Rushworth, Murchison and Nagambie. When I started planning the route, OpenStreetMap looked like this:


The major features are all there, but what’s missing is what matters most to cycle tourists: quiet country roads, and road surfaces. Is there a way to get from Eppalock to Colbinabbin on only sealed roads? Is Buffalo Swamp Rd (near Murchison) really sealed? A great way for me to research is to add to OpenStreetMap: use aerial imagery to add new roads, paying attention to whether they look sealed or not from the air.


So Buffalo Swamp Road is obviously not sealed after all. By the time I was done, the map of the area looked like this:


Notice how many “sealed” roads have turned out to be dirt, but also how many other unmapped little roads have been added to the map.

Once this is done, the steps are:

  1. Finalise the route, using OSRM.
  2. Send GPX files to everyone on the trip
  3. Load the GPX files onto both my GPS and Maverick Pro, an Android App
  4. Also load the tiles into Maverick
  5. Ride
  6. Update OpenStreetMap afterwards with any fresh information – obstacles, unexpected connections, local businesses, and so on.

There’s still lots more to add, but it’s nice that just planning this one trip has significantly improved coverage in a whole region like this.

Git: what they didn’t tell you

Credit:Tim Strater from Rotterdam, Nederland CC-BY-SA

Of all the well-documented difficulties I’ve had working with Git over the years, a few conceptual difficulties really stand out. They’re quirks in the Git architecture that took me far too long to realise, far too long to believe, or far too long to really grasp. And maybe you have the same problem without realising it.

Branch names are completely arbitrary

git branch -d master
git checkout develop -b master

There, I did it. I’m now calling the develop branch master. What you call this branch, and what I call it, and what your Github repo calls it, and what my Github repo calls it just don’t matter. Four different names? No problem. There are some flimsy conventions that Git half-heartedly follows to link two branches with the same name, but it gives up pretty easily.

Remote branches are local

I have very frequently fallen into this trap:

$ git diff origin/master

No differences, so my branch must be in sync with Origin, right? Wrong. What is true is my branch is in sync with the local copy of Origin. If you don’t run git fetch, then Git will never even update its local copies.

Technically, Git has always been upfront about this. The Git book opens the section on remote branches:

Remote branches are references to the state of branches on your remote repositories.

But it’s counterintuitive, and so I keep messing it up. I keep hoping (and assuming) that Git will one day include an auto-fetch option, where it constantly synchronises remote branches

‘Detached HEAD’ mode is fine

Here’s the message that we have seen many times:

Note: checking out ‘origin/develop’.

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

This scary looking message threw me off for a long time, despite the fact that it’s actually one of Git’s most helpful messages – it tells you everything you need to know.

It boils down to this: you can’t make a commit if you’re not at the top of a branch. The two most common situations that cause this are:

    1. Checking out a remote branch. You should do this:

      git checkout origin/develop -b mydevelop
      or even, if you want to abandon your branch completely:
      git branch -d develop
      git checkout origin/develop -b develop
  1. Checking out a commit in the middle of a branch, like:

    git checkout a8e6b18

    Usually in this case you just want to look at it, so you can just ignore the message.

There’s nothing special about ‘git clone’

For a long time, I thought that ‘git init’, ‘git pull’ and ‘git clone’ somehow created repositories that were different, even if they ended up with the same commits in them. It’s hard to recreate my state of mind, but I spent a long time trying to salvage certain directories on disk when I should have just abandoned them.

Similarly, there is no difference between:

  1. git clone
  2. git init
    git remote add origin
    git pull origin master

Well, in the second case Git’s a bit confused about which local branches map onto which remote branches, so you have to be more explicit or fix it with some configuration option.

Don’t call any remote ‘origin’

Credit: Chevassu (GFDL)

For some reason, Git encourages you to call the source of the first clone “origin”. I have found this very confusing and ultimately very unhelpful. Let’s say you’re working on a project called widget, and you fork it in Github so you can work on it. You will want both remotes accessible locally, so you will probably do one of these:

  1. git clone
    git remote add widget
  2. git clone
    git remote add mine

So you either have remotes called “origin” and “widget”, or remotes called “origin” and “mine”. But on the next project, you might make the opposite choice, and soon you really don’t remember what “origin” means.

My tip: never name any remote “origin” ever. Name them all after their Github username.

  1. git init
    git remote add widget
    git remote add stevage
    git fetch --all
    get checkout stevage -b master

You will never understand “git reset”‘s options

The difference between “git reset”, “git reset –soft” and “git reset –hard” is beyond your comprehension. And you probably wanted “git checkout” anyway.

Rule of thumb:

  • If your directory is FUBAR: git reset –hard
  • If you just want to throw away changes to one file: git checkout file
  • In all other cases, google it.

One week of Salt: frustrations and reflections.


Reflecting on Salt. (Credit: Psyberartist, CC-BY 2.0)

Salt, or SaltStack, is an automatic deployment tool for clusters of machines, in the same category as Chef, Puppet and Ansible. I’ve previously used Chef, found the experience frustrating, and thought I’d give Salt a go. The task: converting my tilemill-server bash scripts, which set up a complete TileMill server stack, with OpenStreetMap data, Nginx and a couple of other goodies.

The goals in this kind of automation for me are threefold:

  1. So I can quickly build new servers for Mapmaking for Academics workshops.
  2. So I can manage other, similar (but slightly different) servers, such as and “Manage” here means documenting and controlling the slight configuration differences, being able to roll out further changes (such as web passwords) without having to log in.
  3. So other people can do this too. (That is, the solution has to be simple enough to explain that other people can do it).

The product of my labours:

Disclaimers and apologies

A week is too soon to feel the real benefits of automatic deployment and configuration management, but plenty long enough for frustrations and annoyances. There are lots of good things about Salt, but those will go in another post.

I’ve intentionally written this post as I go, to document things that are surprising, counter-intuitive or bothersome to the newcomer. Once you become familiar with a system, it’s much harder to remember what it was like as a newbie, and easier to make allowances for it. So experienced Salt users will probably rankle at the wilful ignorance displayed herein.

A few things I quickly appreciated, compared to Chef:

  • Two parties, not three. Chef has the client (your laptop), the Chef Server (which I never used because it was too complicated) and the target machine itself. Salt doesn’t have the client, so you do everything either on the master (“salt ‘*’ state.highstate) or on the target machine in masterless mode (“salt-call –local state.highstate)
  • No need for “knife upload”: the Salt Master just looks for your files in /srv/salt, so you can use any method you like for getting them there. I always hated having to upload recipes, environments etc individually – you always forgot something.
  • YAML rather than Ruby.
  • A nifty command line tool that lets you try states out on the fly, or run commands remotely on several servers at once. (Chef’s shell, formerly ‘shef’, never really worked out for me.)

Mediocre documentation

Automatic deployment systems are complicated, and you’d love some clear mental concepts to hang on to. Unfortunately, Salt’s chosen metaphor is pretty clumsy (salt grains, pillars, mines…meh), and they’re not well explained. Apparently the concept of a “state” is pretty important to Salt, but even after reading much documentation and quite a reasonable amount of hacking away, I still have only the vaguest notion of what a Salt state is. What’s a “high state”? A “low state”? How can I query the “state” of a minion? No idea. It’s hard to tell if “states” are an implementation magic that makes the whole thing work, or if I, the user am supposed to know about them.

(After more digging around, I think “state” has several meanings, of which one is just one of the actions defined in a SLS file.)

And then, strangely, how do you refer to the most important objects that you work with, the actual code that specifies what you want done? “SLS files”. Or “state files”. Or “SLS[es]“. Or “Salt States”. Or “Salt state files”. Those terms are all used in the docs. (I thought they were also called “formulas“, but those are apparently only pre-canned state files.)

YAML headaches

It turns out YAML has some weird indenting and formatting rules that occasionally ruin your day. Initially it’s especially tedious having to learn this extra language and its subtleties (> vs |, multi-line hyphenated lists versus single line square-bracketed lists) when you just want to get started – it steepens the learning curve.

IDs, names

The behaviour of statement IDs is cute at first, but not well explained, and becomes pretty tiresome. The idea is you can write this:


Succinct, huh? That’s actually shorthand for this:

    - name: httpd
    - installed

That is, the “id” (httpd) is also the default value for “name”. Secondly, you can compress one item of the following list  (“installed”) onto the function name (“pkg”) with a dot.

So far, so good. But it quickly leads to this kind of silliness:

"{{ pillar.installdir}}/myfile.txt":
    - unless: ...
cmd.wait: - watch: [ "{{ pillar.installdir}}/myfile.txt" ]

Free-text strings don’t make great IDs. On the whole, I’d prefer a syntax which is robustly readable and predictable, rather than one which is very succinct in special cases, but mostly less so.

The next problem is that all those IDs have to be unique across all your .sls files. That’s tedious when you have repetitious little actions that you can’t be bothered naming properly.

Another annoyance is that Salt’s data structure sometimes requires lists, and sometimes requires dicts, and it’s hard to remember which is which. For example:

    - name: |
        echo "All done! Enjoy your new server.<br/>" >> /var/log/salt/buildlog.html
    - name: /var/log/salt/index.html
    - source: salt://initindex.html
    - template: jinja
    - context:
       buildtitle: "Your server is ready!"
       buildsubtitle: "Get in there and make something."
       buildtitlecolor: "hsl(130,70%,70%)"

Notice how the first level of indentation doesn’t have hyphens, the second level does, then the third level doesn’t? I’m not sure what the philosophy is here – it looks to me like each level is just a list of key/value pairs.

Jinja templating logic

Jinja is a fantastic templating language. But Salt uses this template language as its core programming logic. That’s like writing a C program using macros all over the place. It’s sort of ok as a way to access data values:

 - cwd: {{ pillar['tm_dir'] }}

Here, the {{ … }} bit is a Jinja substitution, so the actual YAML code that gets parsed and executed looks like this:

 - cwd: /mnt/saltymill

As most people who have dealt with macro preprocessors know, they rapidly get complex and very hard to debug. The error you see is a YAML parse error, but is the problem in your YAML code, or in what got substituted by Jinja?

It’s certainly possible to use the full range of Jinja functions (that is, {% … %} territory) inside Salt formulas, but for sanity I’m trying to avoid it. On the flip side, certain things that I expected to be easy, like defining a variable in a state {% set foo = blah %} and then being able to access it from inside any other template, turn out to not work. You need to explicitly pass context variables around, or import state files, and it becomes quite cumbersome.

Help may be at hand.  Evan Borgstrom also noticed that “simple readability of YAML starts to get lost in the noise of the Jinja syntax” and is working on a new, pure Python syntax called NaCl.

Many ordering options

There are about 4 competing systems simultaneously operating to determine which order actions get executed in, ranked here from most important to least important.

  1. Requisite order: The “preferred”, declarative system, where if X requires Y and Z requires X, then Salt just figures out that Y has to happen first.
  2. Explicit order: you can set an “order flag” on an action, like “order: 1″ or “order: last”. Basically, if you’re tearing your hair out with desperation, I think. (I tried it once, and it seemed to have no effect.)
  3. Definition order: things happen in the order they’re declared in. (By far my preferred option). This doesn’t always seem to work, in the top.sls, for reasons I don’t understand.
  4. Lexicographical order (yes, really): actions are executed in alphabetical order of their state name. (Apparently this was a good thing because at least it was consistent across platforms.)

The requisite ordering system sounds good, but in practice it has two shortcomings:

  1. Repetitiveness: let’s say you have psql 10 commands that all strictly depend on Postgresql being installed. It’s very repetitive to state that dependency explicitly, and much more efficient to simply install Postgresql at the top of the formula, and assume its existence thereafter.
  2. Scope: Requiring actions in other files seems to prevent the running of just this file. That is, if in load_db.sls you do “require: [ cmd: install_postgres ]“, then this command line no longer works: salt ‘*’ state.sls load_db

I’m yet to see any advantages to a declarative style. I like the certainty of knowing that a given sequence of steps works. The declarative model implies that one day the steps might be rearranged based only on my statement of what the dependencies are.

Arbitrary rules

There’s a rule that says you can’t have the same kind of state multiple times in a statement. This is invalid:

 - source: salt://initlog.html 
 - text: Commencing build...

Why? I don’t know. The rules dictate that you have to do this:

 - source: salt://initlog.html 

 - name: /var/log/salt/buildlog.html
 - text: Commencing build...

Notice the cascade of consequences as this fragile “readability” tower crashes down:

  1. You can’t have two states of the same type in a statement, so we move the second state to a new statement; but:
  2. You can’t have two statements with the same ID, so we give the second statement a new ID; meaning:
  3. We now have to explicitly define the “name” of the file.append function.

Clumsy bootstrapping

I miss Chef’s elegant “knife bootstrap” though. In one command line command, Chef:

  1. Defines properties about the node, like its environment and role.
  2. Connects to the node
  3. Installs Chef
  4. Registers the node with the Chef server, or your client, or both.
  5. Starts a deployment, as required.

With Salt, these all seem to be manual steps:

  1. I SSH to the node
  2. I download and install Salt (using one-liner bootstrap script)
  3. I write ‘grains’ to the newly created /etc/salt/grains file (yes, I think I’m doing this wrong – roles should be defined through pillars maybe)
  4. The node then attempts to register itself with the Salt Master, but because the connection is insecure:
  5. I have to go to the Salt Master and accept the pending key: yes | salt-key -A [and the docs suggest you're supposed to manually verify the keys!]
  6. Now I can launch a deployment: salt-master <nodename> state.highstate

When I rebuild a server, the above are preceded by these steps:

  1. Log in to OpenStack Dashboard, click ‘rebuild’ on the instance.
  2. Edit my ~/.ssh/known_hosts to remove the old SSH key.
  3. On the Salt Master, delete the old key: yes | salt-key <node> -d

There are probably ways of streamlining this process. I hope so, anyway, because it’s pretty clumsy. I don’t know yet whether the SaltMaster can automatically launch deployments when new nodes register.

Super lightweight map websites with Github

Github, the online version control repository host for Git, recently added support for GeoJSON files. Sounds boring, right? It actually lets you do something very cool: build your own “dots on a map” website with virtually zero code.

An example of GeoJSON on Github I whipped up.

An example of GeoJSON on Github I whipped up. Click it.

Here’s what you need to do.

  1. Get a Github repository if you don’t have one already. They’re free.
  2. Create a GeoJSON file. You can export to this format from various tools. One easy way to get started would be to upload a CSV file with locations to then download the GeoJSON from there. Or even easier, use to place dots, lines and polygons with a graphical tool. It can save directly to your GitHub.
  3. Here’s what my test file looks like:
     "type": "FeatureCollection",
     "features": [{ "type": "Feature",
     "geometry": {
     "type": "Point",
     "coordinates": [144.9,-37.8]
     "properties": {
     "title": "Scooter",
     "description": "Here's a dot",
     "marker-size": "medium",
     "marker-symbol": "scooter",
     "marker-color": "#a59",
     "stroke": "#555555",
     "stroke-opacity": 1.0,
     "stroke-width": 2,
     "fill": "#555555",
     "fill-opacity": 0.5
     { "type": "Feature",
     "geometry": {
     "type": "Point",
     "coordinates": [144.4,-37.5]
     "properties": {
     "title": "Cafe",
     "description": "Coffee and stuff",
     "marker-size": "medium",
     "marker-symbol": "cafe",
     "marker-color": "#f99",
     "stroke": "#555555",
     "stroke-opacity": 1.0,
     "stroke-width": 2,
     "fill": "#555555",
     "fill-opacity": 0.5

    It’s worth validating with GeoJSONLint.

  4. Commit this file, say test.geojson, to your Github repository. You can get a preview of it in Github:

    The test GeoJSON file, as seen on GitHub.

    The test GeoJSON file, as seen on GitHub.

  5. Now the really cool part. Embed the map into your own website. This is stupidly easy:
    <!DOCTYPE html>
    <script src=""></script>

    If you don’t have a website, site44 is an extremely easy way to get started. You place HTML files into your DropBox, and they get automagicked onto the web, with a subdomain:

Now what?

That’s it! What’s especially interesting about the hosting on GitHub is it’s a very easy way to have a lightweight shared geospatial database of points, lines or polygons. Here’s how you could add dots to my map:

  1. Fork my repository
  2. Add a few points, by modifying the GeoJSON file
  3. Commit your changes to your repository.
  4. Send me a pull request
  5. I accept the changes, and voila – now your points are shown with mine.

Using this method, we have a “review before publish” workflow, and a full version history of every change.


This is a nifty tool for prototyping social mapping applications, but it obviously won’t cut it for production purposes:

  • No support for different layers: all the dots are always shown
  • No support for different basemaps: always the same OpenStreetMap style
  • No authoring tools: you must use something else to generate the GeoJSON
  • Obligatory “rendered with (heart) by GitHub” footer.

Soon you’ll want to build a proper application, using tools like MapBox, CloudMade, CartoDB, Leaflet etc.

Digital humanities for beginners: get started with the Trove API

Trove is the National Library of Australia’s “discovery interface” – an amazing catalogue of books, newspapers, maps, music, journal articles etc. The Trove API is a special website that programs can talk to run queries, retrieve individual records etc. With some creative ideas and a bit of coding skill, you could make a pretty nifty tool and win a digital humanities award.

The documentation is pretty thorough, but doesn’t tell you how to get started. These days, web development is mostly about assembling the right tools and putting them together, but it can be hard for a novice to know what they need.

This guide is for you if:

  • You want to try exploring Trove programmatically; and
  • You’ve done some coding before; but
  • You’re not very familiar with coding against an API, or writing web applications.

In short, if you think the digital humanities might be for you.

What we’ll need

  • A server on which to run your web application. You can use your laptop directly, but better to use a virtual machine inside your laptop. Even better, a server running on the internet, like a VM on the NeCTAR Research Cloud.
  • A web application framework: Flask and Python
  • A library that makes it easy to make requests to other web servers (we’ll use Requests)
  • A JavaScript framework which makes dynamically modifying web pages much easier (we’ll use jQuery)
  • A CSS framework, which makes your page look attractive with no effort (we’ll use Twitter Bootstrap)
  • A templating engine, to combine HTML and the results of our queries into a web page (we’ll use Jinja2)
  • Our server logic, expressed as code in files.

A server

As an Australian academic, you can create your own server on the NeCTAR Research Cloud. That would be the best option, but can be a bit fiddly for a novice so we won’t explain that here.

The next best option is to create a server on a virtual machine inside your laptop, using VirtualBox and Vagrant. That keeps everything related to this one project self contained, which is a big bonus when you start working on other projects. It also creates a pristine, controlled Linux environment which means you’re less likely to run into issues cause by the peculiarities of your own system. Try the Vagrant Getting Started guide.

As a last resort, the simplest approach is to just install stuff directly on your computer. This may work ok to begin with.

Building a web application

We’re going to build a web application that will talk to Trove. This isn’t the only approach. You could:

  • Build a command line tool that pulls stuff out of Trove and saves it to disk
  • Make a desktop application that a user would have to download and install
  • Make a static website (not a web application) that does everything with JavaScript.

But a web application is probably what you’re going to want eventually and is the easiest to share with other people.

We’ll use Python, the programming language, and Flask, a web application framework for Python.

Why Flask? It’s easy to install, lightweight, and is well suited to experimental programming like this. (An alternative would be Django, which would be much better if you wanted to store stuff in a database and write something big and scalable.)

You can install these on the command line like this:

sudo apt-get install -y python-pip
sudo pip install flask

That is, first you install Python’s “pip” installer. Pip then knows how to install Python packages like Flask.

Whatever server you’re using, create a directory somewhere (perhaps under your home directory), called ‘trovetest’.

cd ~
mkdir trovetest
cd trovetest

Making requests to the Trove API

The Trove documentation says that to perform a query, we need to access a URL like this:<INSERT KEY>&zone=book&q=%22piers%20anthony%22

You might think you need to construct that string yourself, complete with question marks, equals signs and %22’s. You could do that using Python’s built-in urllib2 library:

import urllib2
troveURL = ''
zone = 'book'
search = '%22' + 'piers' + '%20' + 'anthony' + '%22'
r = urllib2.urlopen(troveURL + 'result' + '?' + 'key=' + apikey + '&' + 'zone=' + zone + '&' + 'q=' + search).read()

But it’s easier than that, if we use the Requests library.

import requests 
troveURL = ''
searchquery = 'piers'
r = requests.get(troveURL + 'result/', params = { 
 'key': apikey, 
 'zone': 'book', 
 'q': search
 } )

So, install the Requests library as well:

sudo pip install requests


These days, virtually all web applications use a JavaScript framework, to make manipulating the web page more practical. We’ll use jQuery, which is also required by Twitter BootStrap.

Create a directory under ‘trovetest’ called ‘static’, and download and unzip the latest jQuery.

CSS Framework

Making a web page look good using straight CSS (cascading style sheets) is really hard. Making it look good in every browser and device is a nightmare. Save yourself the pain, and start from somewhere sensible like Bootstrap, made by Twitter. (By default, it looks a bit like Twitter’s website, but you can change that.)

Without Twitter Bootstrap

Without Twitter Bootstrap

Adding Bootstrap and minimal changes.

Adding Bootstrap and minimal changes.

Although it’s best to download both jQuery and Boostrap to your server and link to them there, you can get started quickly by linking to them on the web. Just include this text at the top of your HTML files.

<script src=""></script>
<link rel="stylesheet" href="">
<script src=""></script>

Templating engine

The most common way to build a web page dynamically is by writing a “template” of the HTML page. It can be as simple as this:

<h1>Request complete</h1>
You requested this article: {{ }}

When you tell the web application framework to render the page, you pass in the list of variables – in this case, an object called ‘article’ with an attribute ‘name’.

Templating example

We’ll use the Jinja2 templating engine, because that’s what Flask uses.

Our server logic

Now that everything is in place, let’s write a simple application that will simply make one request. We’ll ask the Trove API for a list of newspaper articles mentioning Piers Anthony (to follow their documentation’s example).

Create this file as ~/trovetest/

from flask import Flask, render_template, request       # load Flask itself
import requests, json                                   # load Requests library, and JSON library for interpreting responses
app = Flask(__name__)                                   # create our web application object, named after this file
apikey = '1b2c3d...'                                    # insert your API key, following instructions here:
troveURL = ''               # all Trove API URLs start with this

@app.route("/list")                                     # this is what we do when someone goes to "http://localhost:5000/list"
def list():                                             # the function that defines the behaviour
    query = {                                           # this dict structure contains the parameters that make up a URL
        'key': apikey,                                  # following the Trove API documentation.
        'zone': 'newspaper',                            
        'q': 'piers anthony',                           
        'encoding': 'json',                             # we specify 'json' as the format because json is easy to parse.
        'include': 'tags' }
    r = requests.get(troveURL + 'result/',  params=query)

    # Requests will transform the params property into a string like
    # ?key=...&zone=newspaper&q=%22piers%20anthony%22&include=tags&encoding=json

    # After calling requests.get(), r is now an object containing lots of information about the response - errors codes, content etc.

    r.encoding = 'ISO-8859-1'                           # We avoid some unicode conversion errors by adding this step.
    results=r.json()                                    # Convert the text we receive (in JSON format) into a Python dict
    return render_template('list.html', results=results)# Render the template, passing through that dict'', debug=True)                 # Run the defined web server, allowing anyone to connect, in debug mode. (This is not safe for a public web server.)

And save this as ~/trovetest/templates/list.html:

<!doctype html>
    <script src=""></script>
    <script src=""></script>
    <title>Trove test</title>

    {% for article in %}
        <ahref="{{ article.troveUrl }}">{{ article.title.value }}</a> - <ahref="item/{{}}">More info</a>
        <i><small>{{ article.snippet | safe}}</small></i>
    {% endfor %}

Run your server

On the command line, tell Python (and hence Flask) to run your application:

/home/ubuntu/trove$ python
 * Running on
 * Restarting with reloader

Flask will sit there waiting for someone to connect to your webserver in a browser.

In your browser, go to http://localhost/5000/list

The results of our little Trove query.

The results of our little Trove query.

What next?

That was a very quick, high level view of all the pieces you need. If you made it through the example, you’re be in an excellent position to start making interesting web applications building on the Trove API. There’s no shortage of information on the web about all these technologies – the hard bit is knowing what you need to know.

All sorts of fun things can be done:

  • Cool exploration interfaces
  • Bots
  • Visualisations like graphs, charts etc
  • Text mining and analysis

Tim Sherratt (aka @wragge) has made many fun creations with the Trove API, documented on his blog. Tim works for the Trove team, who are generally pretty happy to help .


Trello Tennis

(Credit: Flickr’s Carine06, CC-BY-SA)

Here’s my method of keeping track of stuff to do. Other tools just didn’t work for me. Most to-do list tools make two assumptions, I think:

  1. Work arrives from somewhere (eg, a manager), and is readily broken up into pieces roughly 0.5-4 hours in size.
  2. Your primary responsibility is processing these tasks without gaps, maintaining a high level of output.

My work, technology for academics, tends to involve  coordinating lots of little projects with lots of different people. At any given moment, there are few tasks “in my court”, and they tend to be small. My primary responsibility is to keep all projects moving, and ensure that projects aren’t held up by a task being with me too long.

So, I ran with the ball in my court / ball in your court metaphor. It’s useful when:

  1. Your work is primarily about projects, and particularly a relationship between you and the main stakeholder in that project.
  2. You anxiously wonder whether you’re supposed to be doing something on a project.
  3. You have trouble remembering all the relevant bits of information for each project.

“Projects” here can be very small, and often include very early stage proposals that haven’t formed into anything very concrete.

Presenting: Trello Tennis.

Trello Tennis in (fictious) action

Trello Tennis in (fictious) action

To start with, the two most important columns:

  • Ball in my court: projects whose current state requires me to do something (as small as writing an email or checking something, or as big as preparing a whole workshop structure)
  • Ball in their court: projects whose current state is waiting for someone else to do something, such as providing feedback on a document, organising an event, or replying to an email.

The goal each day is to move projects from the “my court” column to the “their court” column. I periodically review “my court”, placing the most urgent items at the top. I also review “their court”, checking whether I should follow up or offer assistance.

More columns, that turned out to be useful for me:

  • All good/meeting scheduled: projects whose current state is that both parties are waiting for an event, usually a meeting. Neither of us has to do anything until then.
  • Wut?: the most dangerous column, projects where I’m not sure what the current state is after all. Am I waiting on them? Did I promise to research something and get back to them?
  • Somebody else’s problem (SEP): where responsibility for the project has been shifted to someone else. I don’t need to do anything, and I don’t even need to monitor it at the moment. Someone else may eventually hand it back to me, but until then I don’t worry.
  • Archive: the project is actually finished.
  • Blocked: there is a technical reason why the project can’t advance, and no one is currently doing anything about it.

Using Trello

Trello works really well for this.

The name of each card consists of both the project name, and a quick summary of my current task (if in my court), or what I’m waiting on if in theirs. Examples:

  • WidgetUpgrade – check if compatible with Ubuntu Precise
  • FoobarSystem – Anita fixing DNS problem
  • Maps for Psych – meeting Thurs 3rd 2pm

What’s particularly fantastic about using Trello is that each card can also track all the work on that project. Changing the title automatically creates a history trace, or I can add comments to record more information. Need somewhere to record a server name and login details? Use the card’s description field.

You can also use coloured labels to group projects, such as tagging by university in my case.

Of course, each individual project frequently has its own task tracking system: perhaps another Trello board, Github issues, Pivotal Tracker, Google spreadsheets etc. I link to them from the card’s description, ultimately tying together all information systems for all projects under a single roof.

The rules

Each day (or whenever):

  1. If there is anything in “Wut?”, attempt to clarify its state, then move it to the right column.
  2. Check each item in “Ball in their court” to see if it needs attention. Set alarms if you need. If it’s time to follow up, just move it back to “Ball in my court” and adjust the title.
  3. Review the “Ball in my court” list, ordering them appropriately – most urgent at the top.
  4. Work through items in “Ball in my court” if you have any free time. :)

To make this natural, I order the columns from left to right as follows: Wut, My Court, Blocked, Their Court, All Good, SEP, Archive. Stress is on the left, calm is on the right.

The goal is to get your board to look like this blissful fantasy:

A totally successful day in Trello Tennis land.

A totally successful day in Trello Tennis land.

These examples are actually tiny compared to (my) reality. By the end of 2013, I had 21 active cards, plus another 10 inactive (SEP and Archive). If you find this technique at all useful, or have any improvements, I’d love to hear about it.

Filtering for sanity

(Added 4 Feb 2014)
Sometimes it gets daunting looking at the huge mounds of work in front of you. Filtering can help cut through the distraction. Two types have worked for me:

Filter by organisation

Filter by client

Are you working for a particular client today, and don’t want to see anything related to any other clients? Assign one label per client, then turn on the filter!

Filter by … you

Filter by youTo focus on just one or two tasks that you need to finish, first assign yourself to those cards. Then, filter to only show cards that you’re assigned to. This feature is intended for collaboration, but it works nicely.