As a data guru in residence, I’m helping government bodies prioritise which datasets to release as open data. Sometimes people say “No one would ever find this data interesting, so why bother releasing it?” I think there are several distinct reasons why a given dataset might be worth releasing. Some datasets are valuable for several reasons simultaneously. Some aren’t valuable at all.
When a public servant comments that a potential dataset isn’t interesting or useful, ask: “are there other reasons to release it”?
But if a dataset fails to meet any of these criteria? You have my permission not to release it.
#1 Build an app around it
Datasets like public transport timetables, public bike share station status, or parking space availability are obvious candidates for third party developers to use to build an app. Unfortunately, these examples also require near-realtime feeds in order to be useful.
#2 Support other apps
Even if a dataset isn’t interesting or useful enough to warrant an app in its own right, it could add value to another website or app if it’s easy to use. I’ve come across many of these:
- Average traffic volume on roads maintained by VicRoads, used to help cyclists decide which roads to avoid, on cycletour.org.
- The slope of footpaths around Melbourne can help wheelchair users navigate the city.
- The location and species of every tree in Melbourne can add colour and interest to a map of the city.
- Locations of drinking fountains could be useful for cycling, jogging, or dog walking apps or websites.
#3 Interesting for research
If a dataset is big, rich, detailed and high quality, then there’s a pretty good chance it’s worth of some kind of analysis. If it’s unique enough, then it might even interest a researcher in starting a research project just to look at this dataset.
Examples: building permits database, public transport timetables (for urban planning).
#4 Supporting other research
Much more common than such a rich dataset is small datasets that researchers find useful to solve particular problems, add context, or strengthen an analysis. Local Government Area boundaries aren’t inherently interesting, but they’re one of the geospatial datasets that researchers request the most often. The ATO’s Standard Business Rules taxonomy sounds incredibly dry to me, but is of potential use to lots of people trying to glue different kinds of data and applications together.
#5 Policy and analysis
Lots of organisations need government data to develop internal strategies or policies to be shared with the public – or even to influence government. Typically they get the data either by transcribing tables from official reports, or by developing direct relationships with the government body in question. Publishing data directly to an open data portal allows a wider range of groups to make use of it, without the overhead of having to ask whether the data is available. Data that is collected regularly, in the same format is a particularly likely to be useful.
If the data relates to how government decisions are made, it may be worth releasing to demonstrate transparency – regardless of how much the dataset is even used. For example, releasing annual budget data as an easy to use spreadsheet makes a big political statement about willingness to be scrutinised. Even if no citizen takes up the opportunity to crunch the numbers, they may still appreciate having that option.
Examples: annual budgets, revenue sources (parking meters, speeding fines), parliamentary voting records.
#7 Insights for government
If you’re really lucky, the dataset you publish may help another part of government do something useful. I think good things happen when people can access data without having to ask anyone for it, and the some goes for governments themselves. You can’t really expect insights, but if it happens – great.