Skip directly to content

Things to watch at Strata Rx: 5 underlying challenges for sharing health data

on Tue, 10/16/2012 - 06:37

This week brings us the first Strata Rx conference, which explores the role of data and data science in health care. Very timely, because health care is at a crossroads. In many more developed countries, rising cost combined with stagnating outcomes and aging populations make health care systems unsustainable. In less developed countries, a dual or triple disease burden and stagnating development assistance for health hamper progress. Tim O'Reilly said in a recent conversation on health care (worth watching!) that "change happens when the pain of not changing is greater than the pain of changing". Health care is there, ready to be disrupted, and data is key to driving that disruption. It's one of our biggest challenges in the 21st century. 

Changes in technology have revolutionized the possibilities for collecting and analyzing health and health related data (sorry about the buzzword bingo): patient data are captured in electronic health records, smart phones capture and transmit volumes of personal data, social media capture health self-assessments, wearable sensors enable uninterrupted data collection and transmission, genome sequencing is now almost affordable, and cloud computing, open source software, machine learning, and big data management enable sophisticated analysis of all these data. With all these opportunities, leveraging health data to fix health care is not only one of the biggest, but also one of the coolest challenges in the 21st century.

However, there are 5 underlying challenges for leveraging data to fix healthcare which center around transparency and accessibility.

  1. Privacy:  sharing data about individuals requires protecting their privacy. However, there is an inverse relation between the availability of identifiers and the usefulness of the data. In addition, linking data from different sources enables much more powerful analysis but also increases privacy risks. When sharing useful health data, there always remains a (often very low) risk of identification. Therefore, we need strong de-identification techniques as well as powerful legal deterrents from using data to identify individuals. And we need to create trust in individuals that their data are handled responsibly.
  2. Consent: individuals need to agree that their data are being shared with others. They should be able to decide exactly what their data can be used for, and be able to remove that consent if they wish. Currently, there is limited transparency and very little control for patients over how their data are shared.
  3. Data Use Agreements: fully de-identified data (i.e. data with a very low risk of identification of individuals) should be shared as open data. Data with identifiers can be shared as limited use data for appropriate uses and with data use agreements. However, there are currently no standards around these kinds of agreements and their stipulations, making it often difficult to negotiate and implement them.
  4. Research ethics: research that involves collecting data from individuals or using data with direct identifiers often require ethics oversight, e.g. by Institutional Review Boards. Regulations like the United States' HIPAA detail what can be shared and how. While this oversight is necessary, it often hampers progress by being too strict and difficult to implement. Regulations and their interpretations need to keep pace with the current rapid developments in data collection and analysis, the globalization of research, and individuals' attitude towards data sharing, e.g. in social media.
  5. Incentives for sharing: there are powerful arguments for sharing. Open data can create entire ecosystems. Sharing unlocks external creativity and analysis, and most of the world's smartest people don't work for you. Most importantly, sharing and using health data can save lives, so sharing data becomes a moral imperative. However, many reasons beyond privacy and consent keep data owners from sharing data: competition, fear of misuse, reluctance to share the power of information, political agendas, academic publication plans, etc. The fragmentation of  health systems compounds the number of different players that have a plethora of different motivations for not sharing health data. We need better incentives and frameworks to encourage and facilitate data sharing. Patients can take a lead role here by sharing their own data and requesting providers and others to share their data responsibly.

The next two days will touch heavily on these areas, and I'm looking forward to connecting with other health data innovation enthusiasts. Follow me on Twitter for instant updates, and stay tuned for follow-up posts.

Providing access to detailed demographic and health data: Census Research Data Centers

on Tue, 09/25/2012 - 10:43

Sharing health data and making them 'open' isn’t easy, one of the key reasons being privacy. You can remove direct identifiers, but detailed other data like treatment dates can make it possible to identify subjects. Quickly growing amounts of information in marketing databases and social media further add to the risk of identification. On the other hand, the more details you remove from a dataset, the less useful it becomes for analysis and research. In the end, it’s about balancing the risk of identification with the usefulness of the data.

In order to make data with lots of detail along with direct and indirect identifiers available to researchers, data owners need to create controlled environments in which researchers can use the data for approved purposes and retrieve results which create little or no risk of identification. The US Census Bureau runs 14 Research Data Centers (RDC) across the US that do just that. The latest one, the Northwest Census Research Data Center (NWCRDC), was opened yesterday at the University of Washington in Seattle by the acting Director of the Census Bureau, Tom Mesenbourg, and the Director of the NWCRDC, Dr. Mark Ellis.

Like the other Census Research Data Centers, the NWCRDC provides access to demographic, economic and health microdata (i.e. respondent level data), including censuses, surveys, administrative data, and health data from the National Center for Health Statistics (NCHS) and the Agency for Healthcare Research and Quality (AHRQ). The datasets go back to the 1970s including the 1970 decennial census, and the Census Bureau is working with the University of Minnesota to make the microdata from the 1960 census available. The data are available to qualified researchers for projects that are reviewed carefully to prevent abuse. However, once approved, researchers can link individuals across datasets in the RDC and even link in own datasets with identifiers. This provides very unique opportunities for research otherwise not possible and is a fantastic resource for researchers in the Pacific Northwest.

Why should data holders consider going this route? For the US Census Bureau, the most important benefit are new estimates and data products, efficiency, expanded measurement capabilities, and improved documentation of their own data. They are tapping into creative and innovative thinkers to find additional uses for the data that contribute to the Census Bureau and the American public. Currently, more than 650 researchers are working on 150 projects across the 14 RDCs.

Researchers that want to use the RDC need to write a proposal about their planned research (more details here), which is reviewed by the US Census Bureau for their scientific merit and benefit for the US Census Bureau and the public. Proposals to use data provided by other agencies like NCHS and AHRQ are reviewed by those organizations. All work has to be conducted at the NWCRDC on campus at the University of Washington.

More data owners or data holders should consider making more detailed data available. Research Data Centers are one possible solution. I'll discuss others in future posts.

A Buffet of Health Data

on Wed, 09/19/2012 - 14:55

This is a cross-post from the Healthdata.gov blog and co-authored by Aman Bhandari (@GHideas) and Steven Randazzo (@worksteven). Aman and Steven work with US CTO Todd Park and are driving forces behind the Department of Health and Human Services' Health Data Initiative.

Hundreds of codeathons are held throughout this country every year resulting in the development of innovative applications, like the “Like” button on Facebook, or solutions to critical social and health problems, like childhood obesity. 

The Department of Health and Human Services is interested in the development of innovative applications and solving critical social and health problems, and to help you optimize the opportunity you have to solve some of the most critical health issues this country faces we have developed HealthData.gov.  Healthdata.gov is populated with resources for developers, entrepreneurs and people who just want to play around with health data. On HealthData.gov there are over 300 datasets listed  which include everything from the FDA adverse events reporting database to information on over 120,000 clinical trials to Head Start locations nationwide to the Health Indicators Warehouse.  In addition to the robust amount general health data, the Centers for Medicare and Medicaid Services (CMS) has national compare data available, ranging from hospital compare, to nursing compare to dialysis compare data, all of which can be found on Healthdata.gov and on Data.Medicare.gov.  To help you navigate Healthdata.gov and the available datasets, we have a slide deck that is our health data starter kit that will take you through an introduction of some of the datasets we have available.   

We have already seen success from developers who have taken open health data and leveraged it to tackle important health issues like FDA recalls at the Hokie Hackathon in Blacksburg, Virginia, childhood obesity at the Cajun Codefest in Lafayette, Louisiana or unemployment and its contributing factors at unWIREd in Baltimore, Maryland.  By participating in your own codeathon or the upcoming codeathon with the Greater Baltimore Tech Council, Groundwork, September 28th and 29th in Baltimore, Maryland, data is the fuel to solve some of the biggest health care problems in the nation.

At the largest highlight show of what developers and entrepreneurs are doing with open health data, this past June we held our 3rd Annual Health Datapalooza with over 1500 participants where we profiled how over 75 companies are using open government data to power their services, applications and insights. If you need some inspiration or ideas we have video of all the companies presenting at the 2012 Health Datapalooza.

If you want to stay abreast of related events and what we have going on you can sign up for our HHS Innovation Update and our weekly data news feed that focuses on the intersection between data, health and technology. Finally we will be opening up a call for applications to present at the 4th Annual Health Data Palooza in December for which anyone can apply.

Codeathons across the country have used open data as a raw material to supply their creations. In addition, open health data is being leveraged in several prize competitions that are currently open. Some data and non-data focused examples are listed below.

Challenges

Health Data Platform Simple Sign-On Challenge

  • Deadline for submissions: October 3, 2012
  • Total Prizes: $35,000

Health Data Platform Metadata Challenge

  • Deadline for submissions: October 3, 2012
  • Total Prizes: $35,000

My Air, My Health Challenge

  • Deadline for submissions: October 6, 2012
  • Total Prizes: $160,000

The Million Hearts Risk Check Challenge

  • Deadline for submissions: October 31, 2012
  • Total Prizes: $125,000

Ocular Imaging Challenge

  • Deadline for submissions: November 9, 2012
  • Total Prizes: $150,000

Medicaid Provider Enrollment Screening Challenge Series

  • Deadline for submissions: November 16, 2012
  • Total Prizes: $500,000

Challenge: Reducing Cancer Among Women of Color

  • Deadline for submissions: February 5, 2013
  • Total Prizes:  $100,000


Read more at http://www.healthdata.gov/blog/buffet-health-data#qk3bldDWbQ58wYaU.99

Open Government Data at IOGDC

on Sun, 09/16/2012 - 23:07

Below is a presentation I just put together with insights from the International Open Government Data Conference (IOGDC) which took place in July 2012 in Washington, D.C. I am presenting this deck at an international work group meeting tomorrow and would love to get your feedback or additional insights.

If you didn't have a chance to attend the conference, I wrote an overview of open health data and some nuggest of wisdeom from the conference, as well as thoughts on creating an open data ecosystem. There are also lots of presentations and great materials posted online on the conference website

10 key ingredients of health data innovation

on Thu, 09/13/2012 - 21:57

As a reader of this blog, you have already seen various aspects of health data innovation. This post starts a series of more concise overviews of its 10 key ingredients. If you have feedback or ingredients to add, I'd be happy to discuss.

Why do we need health data innovation? Rising health care cost are getting to unsustainable levels while health improvements are stagnating. Health data innovation aims to improve health and reduce cost through creative, scientific and entrepreneurial use of health data. The open data movement provides a great blueprint: share data, market the hell out of them, and encourage entrepreneurs, developers, and other interested folks to create transparency, accountability, new products and services, economic activity, and jobs. This benefits the innovators, but also the field overall and the data sharers themselves; weather and GPS data are good examples for this.

In the case of health, sharing data becomes vital in the truest sense of the word: data can save lives by providing evidence for research and evidence-based medicine, health care and public and global health. However, the fact that those data cover human subjects creates issues around privacy and consent that require a layered approach for data sharing. Privacy and rights of the subjects need to be balanced with broad data access for innovation. Facilitating access to health data is the responsibility of the data holder but requires consent of the individual (patient or healthy individual). Patients can request a copy of their health data and share those. And other stakeholders can create the incentives and frameworks that encourage health data sharing and innovation. The graph on the right provides a semi-structured overview of related key players, types of data and trends.

There are 10 key activities to create and foster health data innovation:

Holders of health data, including providers, payers, producers and researchers, should do what they can to make data available and get them used.
  1. Provide individuals with access to their own data and ensure their authority over other uses of those data
  2. Maximize the quality of data, metadata, and documentation, and adhere to standards where possible
  3. Make fully de-identified data publicly available as open health data at the highest level of detail possible
  4. Use restricted access mechanisms for data where individuals can be identified
  5. Make it easy for data users to find and use relevant data
  6. Contribute to a health data ecosystem that encourages innovation

Patients and healthy individuals play an increasingly active role in health data innovation, leveraging technology to access their health records and collect data about themselves (quantified self)

  1. Get individuals to share their own health and quantified self data 

Other stakeholders like governments, academic journals, regulatory authorities, and funders can leverage their influence over organizations that hold health data

  1. Create incentives (financial, academic, and other) and requirements (regulatory or tied to funding or publication) for data holders to share data
  2. Create and enhance the regulatory framework to facilitate data sharing
  3. Create and foster innovation infrastructure by supporting entrepreneurship, technology, and education

Before I start going into details, let's pause. Do you agree? Are there ingredients / activities to add? Let me know in the comments or contact me.

Pages