Health Data Innovation at the International Open Government Data Conference
This week, fans of open government data from all over the world are converging on Washington, DC to attend the International Open Government Data Conference, organized by Data.gov, the World Bank and the Open Development Technology Alliance. I will be there and look at health data and innovation in particular. Registration is closed, but here are more ways to listen in or participate:
- Follow live blog and web stream
- Track the #IOGDC hashtag on Twitter
- Follow me on Twitter to get health-related updates
For all of you interested in health data innovation, here are some thoughts on open data, health data and innovation as casual prep reading for IOGDC. Open data - and open government data in particular - are fundamental enablers of innovation. However, opening up health data with senitive information about individuals has a few fundamental differences compared to other government data. In particular, privacy and de-identification, control over the data, and data linkage need to be addressed carefully. In addition, stimulating innovation requires facilitating and encouraging the creative use of data. This suggests a number of considerations for data owners / holders as they open up their data:
- Share as much data as you can: owners / holders of health related data have a moral obligation to share data (responsibly!) because health-related data can be used to improve health and save lives. More about what 'sharing responsibly' means down in the 'health data' section. The Open Data movement and the related national and sub-national open data sites like Data.gov are absolutely key to health data innovation.
- Make the data easy to find: posting the data on organizational websites is a good start (even better if the site is search engine optimized), but to reach broader audiences, the data should also be posted on open data sites, data catalogs, data repositories and data markets.
- Make the data as open as possible: Tim Berners-Lee suggested a five-star scheme to rank openness: making data available (*) - providing strucutured data (**) - using non-proprietary formats (***) - using URIs to identify data (****) - linking data to other data (*****)
- Manage privacy risk with proper de-identification: With data about individuals, there is no guarantee of complete privacy. Huge marketing, social media, and other databases, as well as improving techniques of probabilistic linkage make it easier to identify individuals in de-identified datasets. Hence, sharing de-identified data means managing the risk of identification. Since many data owners equate lowering the risk of identification means with providing less detail (e.g. providing county or even state instead of postal code of residence), the shared data become less useful. However, there are increasingly sophisticated methods to help data sharers manage the risk of identification.
- Put individuals in control of their data: it is hard to say who actually owns healthcare data, but as Fred Trotter over at O'Reilly Radar points out, it's really about who controls access to the data. For healthcare data, there is a lot of fine print that patients sign off on (and then there is legislation like HIPAA in the US), which put most control in the hands of the provider. For survey and study data, participants' consent provides clear provisions about how the data can be used. However, consent often doesn't include provisions for broader data sharing and open data. Consent and privacy statements should anticipate opening the data; in addition, individuals/patients should have access to their health data (e..g via a Blue Button, used by the VA and now many other providers) so they can take the initiative and share their data.
- Enable linking all data about a patient: For real health data innovation, data about an individual (data from different providers, family history, lifestyle information, location data) need to be available in combination. Some research databases already provide access to linked data, but we need better ways of pulling together and opening patients' data to allow research, analysis and innovation (again, with proper privacy safeguards and patient consent)..
Innovation based on health data
- Make the data ridiculously easy to use: data that are being shared should be cleaned, well documented (documenation about data collection and any subsequent data editing/prepping), properly labeled, and provided with all relevant metadata (ideally following standards like DDI or SDMX-HD). In addition, a lot of innovation comes from entrepreneurs, developers and other folks outside the health & healthcare fields, who may need additional guidance about proper use of the data.
- Target all relevant audiences: different data users will have different preferences in terms of data formats, granularity (microdata, indicator data, individual data points), and type of access (query tools, dataset download, API). Offering different options will increase the potential uses but also increase cost/resources needed. In addition, data visualizations based on the data can help stimulate interest, insights and innovation.
- Don't make people find data. Make data find the people. (open data law that Tim O'Reilly has been sharing for years): as US CTO Todd Park puts it, you have to 'marekt the hell out of your data" to stimulate innovation, via communication, marketing, and events that stimulate use (e.g. app or visualization challenges, hackathons, datapaloozas). Done right, entire data ecosystems can emerge from open data, as evidenced by the successful examples of weather data, GPS data, and - more recently - the great progress around health data initiated by the energetic HealthData.gov team.
The next three days at the International Open Government Data Conference should bring a wealth of new insights around these and additional topics. Stay tuned for updates, and ping me via Twitter if you'd like to discuss or meet up at the conference.