Skip directly to content

10 recommendations for open (government) data publishers

on Fri, 07/13/2012 - 06:31
I am back home from the hugely motivating and energizing International Open Government Data Conference, hosted by Data.gov and the World Bank. A tremendous group of people got together at the World Bank over the last three days to discuss a broad range of topics related to open government data (also see my post from Day 1). Participation in the conference shows that the open (government) data community is really thriving: 450 in-person participants from over 50 countries, 4000 online participants, 2000 tweets. 162 speakers covered a lot of ground: Great case studies of open data initiatives from national, state and city governments in developed and developing countries. Insights from data users, including developers, entrepreneurs, data journalists, academic researchers and others. Advice and updates from technologists and platform providers. Discussions about standardization. Great stuff. I got a better understanding of the benefits of opening up data along with a number of recommendations for open data work; many of those are just as applicable to other open data efforts.
 
Benefits of open government data
Governments have several benefits from opening up their data: it increases trust in the government by providing more transparency and accountability. It helps improve public services. It stimulates economic activity and generates jobs. As the UK government was surprised to find out, it can also help the government improve the use of their own data. It helps to increase the exchange of information among government agencies (which are often siloed) and improve collaboration. And as an additional perk, open data will also lead to savings by reducing work on specific data requests. Here is how to get there:
 
1. Get the data out there
It's easy to get stuck in discussions about platforms, formats and standards. It's also easy to delay releasing the data to work on them until they are as good as can be. Instead, governments should start with a focus on getting the data out there. Value added work on the data can follow later, e.g. restructure the data with an eye for external users or linking / combining different datasets. These tasks could also be done by external players, who may re-package data for specific audiences and make them easier to use. In general, the priorization of data releases should follow user demand rather than just publication of data that are easy or convenient to publish.
 
2. Make the data open
Open data should be just that. Open. Available to use for anyone, available for any use (commercial and non-commercial), available to be redistributed. A proliferation of licenses and different types of licenses can seriously hamper the usefulness and ultimately the impact and success of open data.
 
3. Make using the data easy
Different data users will prefer different methods of accessing data, e.g. analysts/scientists/data journalists tend to want to download detailed data, developers want ongoing API access, and less sophisticated data users want query tools and data visualizations. Publishers should offer access at least via download and API. In addition, data need to be properly documented. Data by themselves can be misleading or used in the wrong context. Governments should make sure that data collection and subsequent work on the data is properly documented, and that the data are labeled, in machine and human readable format.
 
4. Get as much data out there as possible
Governments should share all data that are collected with tax money and can benefit their citizens. Two key reasons stand against sharing data: privacy and national security. Those two are very valid arguments, and data have to be vetted carefully to be released without unreasonable (!) risks to either. We need to keep in mind, though, that in today's world of databases and social media, privacy can often not be guaranteed, instead risk of identification needs to be managed). However, experience shows that governments often use both reasons as an excuse to avoid having to share data. Finally governments need to avoid sharing only selected data that fit their political agenda or only "toy datasets" that are of only limited use, and then claim openness.
 
5. Don't start from scratch
There are several powerful platforms that simplify the process of publishing data online, including open source and commercial ones. In many cases, this should eliminate the need for custom solutions. Present at the conference were Socrata, CKAN, BuzzData, Microsoft Open Data Initiative, Junar and the new Open Government Platform (aka "Data.gov in a box"). In addition, there are providers like Datamarket.com and Knoema that provide white-label platforms for data publishers. These platforms can provide significant savings in investment and cost, but chosing the right platform requires careful consideration (more on that later).
 
6. Engage with data users
Governments should maximize the impact of open data. To that end, they need to engage with users to encourage and promote the use of their data. Starting this process is significantly easier if there already exists a vibrant entrepreneurial and developer community in the country. Governments should also engage with data users to get feedback on release priorities, and to discuss data quality and possibilities for improvement. Conversely, data users should never be afraid to ask for data, be persistent (tenacious?) in trying to get access, and provide feedback on data use cases and quality issues.Launching an open data platform is the easy part, it's much harder to make it sustainable and create an ecosystem around it (see related post here).
 
7. Make the data discoverable and linkable
A key problem in the open data field is the discoverability of data across platforms (it better be pretty straightforward to find data within one platform). To that end, data publishers should keep in mind that machines are one of the key audiences for the data. Schema.org and other standards can enable their data to show up properly in search results. In addition, there is a need of interoperability and linkage of data, which make standards for open (government) data are essential; David Eaves compared these standards with shipping containers that revolutionized and scaled global trade; similiarly, standards help make open data efforts scalable. However, cumbersome standards can also be like a straight jacket that hampers progress. Standards development should be very targeted to drive adoption, then expand from there (you can't have breadth and depth in the beginning).
 
8. Focus on local circumstances
There is a danger for governments to fall victim to geek dazzle (going for the big shiny new toy) or - in the case of developing countries - donor dazzle (doing what donors want). Instead, they should focus on their specific circumstances and implement a pragmatic solution. Especially in coutries where there are inequalities in terms of access to technology and funding, open data can increase the digital divide by allowing an advantaged few to make better use of the data for higher gain (information/data is power!). In particular with regard to databases that (help) establish rights (e.g. land ownership), data releases should wait for full due diligence.
 
9. Measure the success of open data
The impact of open data should not (just) be measured by the number of datasets shared (although that's a helpful indicator as well). The best validation of success of data publishing is data use. Measuring whether and how the data are used is hard. There are anecdotes about the success and usefulness of open data, but very little in terms of systematic or quantitative measurements. The few numbers mentioned (for the US) include $100B for economic activity generated by GPS and weather data, and a potential of $350-400B in impact by opening up health data (based on a McKinsey study).
 
10. Identify data gaps
Governments have to have data to be able to share them. Capturing and highlighting gaps in existing data helps direct efforts to fill those gaps. This is another reason to engage with the entrepreneurial community; in the absence of useful data, entrepreneurs and developers may start (or already have started) own data collection efforts, e.g. via crowd sourcing.
 
Let me know what other pieces need to be part of this list. I'd be happy to edit and expand.

Post new comment