Skip directly to content

10 key takeaways from Health Datapalooza 2014

on Tue, 06/03/2014 - 15:08

Early June is Health Datapalooza time, and this year's event was again a whirlwind of energy, insights, and over 2,000 very motivated and passionate health data enthusiasts (health datapaloozers doesn't sound right). The hallway conversations provided tons of of pragmatic, productive and engaging conversations, inspired by keynotes and presentations from Todd Park, Bryan Sivak, Jeremy Hunt, Atul Gawande, Steve Case, Jerry Levin, Vinod Khosla, Kathleen Siblius, Dwayne Spradlin, Francis Collins, Fred Trotter, and many more (if you haven't heard of some of them, look them up, it's worth your time).  

  1. Data liberation! US CTO Todd Park's famous battle cry still holds true. We need to turn more data from "passive into active data" (US Secretary of Health Sebelius) and make them more broadly available and used. Here is some advice if you ask for or open up data: it's key to be clear on what you want to use the data for, indicate who will benefit from this work, work with the data owner/hoarder in a collaborative fashion, and be transparent via open source. You can also use FOIA (freedom of information act).

  2. Silos: I knew silos were a problem in US healthcare. But health data are are spread across more places than I had imagined. As an example, Athena Health had to create connections to 110,000 (!) other systems for the 50M patients that they provide EHRs for.

  3. New data sources: more and more data collected outside the health system, contributing to #2: quantified self, social media, purchasing, location data. There are smaller sensors, cheaper devices, and a plethora of apps to help anyone track their health and lifestyle. There are few efforts to let individuals bring these data together in central platforms, and healthcare professionals typically don't (want to?) use this information.

  4. Patient in charge: patients can increasingly be in charge if they want to: they have better access to their data (150M Americans now have access to their health data via Blue Button), technology supports tracking of vital health information (see #3), and there are platforms that let patients bring (some of) it in one place, enabling them to analyze and evaluate what works and doesn't work for them. UK Secretary of Health Jeremy Hunt put it best: the transparency of data reverses the relationship with the physician and puts the patient in charge.

  5. Services in the background: we are now in phase 3 of the Internet (says Steve Case); after building the internet (phase 1) and building apps and services on top of the internet (phase 2), the internet is currently being integrated into everyday life, enabling much more seamless experiences that incorporate data from apps, devices, health information etc. But this is still very much in its infancy. See #8 for a potential approach.

  6. 20% doctor included: Machine learning is key for the future of medicine. Cognitive limitations and an overload of information mean that physician diagnoses and treatment recommendations are not overly consistent. However, machine + doc will be the winning combination; a decent chess computer with a decent (human) chess player will outperform the best chess computer. Make sure to read Vinod Khosla's paper 20% doctor included

  7. We heard of fabulous utopias where all data on an individual are in one place, enabling smart machine-learning algorithms to monitor vital signals, predict health risks and enable and encourage prevention. There are different options to get there (more on that in a future post), but none of them is a shoo-in.

  8. Walled gardens: the announcement of Apple's HealthKit created additional fodder for discussion about walled (healthdata) gardens; their collaboration with (walled) electronic health record provider EPIC makes that garden potentially bigger but not more open. However, there will be more opportunities to have health data from iOS apps contribute to the same central system, likely in a very user friendly way.

  9. It's the sickest patients who are driving cost. Our health and data collection systems are currently not designed for them, and new services like Patientslikeme are not integrated into healthcare delivery.

  10. Getting to the roots: the app demos and start-up showcase showed amazing progress we have made. While a flurry of apps at past events sometimes looked a bit gimmicky, this time there were several efforts that address issues at the health system level (have a look at some of the start-ups here).

A huge thank you to the Health Data Consortium and CEO Dwayne Spradlin for a fabulous conference. Looking forward to Health Datapalooza 2015, 5/31-6/3 in DC.

And if you missed it, we launched our new white paper on Communicating Data for Impact at the conference and got quite a bit of nice feedback. Enjoy the read!

Communicating Data for Impact

on Fri, 05/30/2014 - 17:40

Do you have a powerful dataset and want to see it used?
Do you want to stimulate or inform change with data?
Do you want to encourage evidence-based decision making?

Then chances are that you have thought about how to get the right data to the right audiences in the right format.

A new white paper, Communicating Data for Impact, provides you with some (hopefully useful) structure and guidance for addressing this question. The paper was co-authored by Nam-ho Park, Brian Pagels (both with Forum One), and yours truly. The most important advice is to start the process with defining the audiences you are trying to address. I'm pasting the key table from the paper below; it details how we suggest to address the needs of key audiences with different levels of detail in data and visuals. The paper also provides a number of real-world examples from IHME's seminal study of health around the world, the Global Burden of Disease Study.

Feel free to download the white paper, and please let me know if you have additional (or different) ideas and suggestions, via comments, Twitter (@peterspeyer), or the contact form.

 
Overview of audiences to get from data to impact

Six steps for applying data science: it's all about teamwork

on Mon, 04/28/2014 - 05:15
Data science is crucial to making sense of all the health data collected in the health care system, by patients (and healthy individuals), by governments and others. "The study of the generalizable extraction of knowledge from data" (Wikipedia) is a vital step to making data useful for strategy, planning, and policy and decision making. Data scientists are in high demand, commanding large salaries, and are generally predicted a bright future. Rightly so. But to really create an impact with data science, there are 6 distinct steps that each require different types of expertise. Sure, they can all be done by one individual. But for larger projects, and to maximize impact, you'll need a team of experts with different specialties. In reality, data science is all about all about teamwork.
 
Step 1: objective & approach
Before touching any data, the objective of the data science exercise has to be defined. With the final audience(s) of the results in mind, stakeholder need to identify the key questions that need answers. With more data being available, the question much too often becomes "what can we do with these data". Instead, focus on the question. In some cases, the answer may not even require a large project. Once the goal is clear, the team needs to identify relevant data, the (likely) analytic approach, as well as the key metrics and dimensions that will be needed in the results. This first step should involve experts from all the following steps.
 
Step 2: data seeking and collection
Based on the information of Step 1, data experts like librarians, information scientists, or domain experts need to identify relevant existing data via literature reviews, web searches, or personal networking. Sometimes, some or all of the data will need to be collected via primary data collection. If useful data exist, it's often still hard to identify relevant datasets, find the data provider, and get access to the data. Barriers like unwillingness to share data, lack of documentation, insufficient capacity or expertise to share data, language, or data formats can make obtaining the data rather difficult.
 
Step 3: data preparation
Steps 3 and 4 are at the heart of the data science project, and are often combined into one. Once the data have been obtained, data analysts or scientists need to prepare them for analysis. Lots of data are still stuck on paper or in PDFs and need to be digitized. Unstructured data need to be turned into structured data. Microdata need to be aggregated, linked, and analyzed. Data from different sources often require cross-walks, e.g. between different versions of the International Classification of Diseases (ICD). At this stage, correction of data quality issues can be applied, e.g. correcting for "garbage codes" in cause of death certification. The ideal end result is a coherent dataset that can be used for analysis.
 
Step 4: data analysis
Once the data are prepared, the rubber hits the road as scientists apply mathematics, statistics and computer science to the data. Different models can be applied to the data, from simple regression models to machine learning. Predictive validity testing can help identify the best model, e.g. for analyzing causes of death. With more data, more powerful computation and more sophisticated methods, analytic projects can quickly turn into veritable software development projects. These projects require a very systematic approach to coding up the analysis, ideally with the involvement of software engineers. In addition, interactive visualizations can be extremely useful to review the results of the analysis, requiring yet another set of database and coding skills. Typically, scientists are experts in just one or few of these areas,  requiring teamwork on this step alone (creating a team can also be the reaction to the current data scientist shortage). 
 
Step 5: data and code sharing
Much too often, the results of significant analytic tasks are used to answer the question, but are not shared for broad re-use. Of course, there are often political, competitive, legal, resource and other considerations that make data and code sharing impractical or impossible. However, whenever possible, code and results data should be shared in as much detail as possible. In addition, full citation lists and links to the data sources should be provided, ideally along with the actual input data to enable others to reproduce the results, or build on the analysis. However, much too often data use agreements, copyright and other legal constraints make sharing the actual input data difficult or impossible.
 
Step 6: data translation
The results of a data science project are often of interest for very different audiences, ranging from academic and other researchers to analysts, domain experts, policy and decision makers, journalists, bloggers, activists, and many others. The team needs to provide results in appropriate formats or products for these audiences, e.g. via peer-reviewed publications, books, policy reports, press releases, infographics, or interactive visualizations. Creating these products requires a good understanding of the relevant audiences and a good command of the subject matter. In addition, the data science team can offer additional advice and insight to the relevant audiences, or engage in collaborations to use and build on the research.
 
All these steps are crucially important for the success of a project. Are the results credible if relevant input data were not used? And is the analysis worth it if the results are accessible to other researchers in a published paper  but not used by decision makers? The steps are also highly interrelated, e.g. the type of data available for analysis will impact what methods can be applied for analysis. While it's possible for an individual to go through these steps alone, doing this as a team will create a much better chance of success and make the work much more productive. And fun.

Party trick video: Hans Rosling & Bill Gates explain global vaccination rates

on Mon, 04/28/2014 - 04:41

Hans Rosling and Bill Gates just launched the first of what appears to be a series of "Demographic Party Tricks". Using fruit juices, pitchers and glasses, Hans and Bill explain what percentage of children in the world gets essential vaccines. An entertaining way to make statistics accessible:

  • The demo is tailored to the audience (it's also tailored to the setting, a party).
  • Very few actual numbers are mentioned, which makes them easy to remember.
  • Audience guesses show that our assumptions are often way off.
  • Illustrating the cold chain required to keep the vaccines active provides very useful context.

Great example for data translation!

Purple cows in health data: applying marketing concepts to communicating data

on Sun, 04/13/2014 - 10:08
Seth Godin just gave a riveting talk at the Global Health and Innovation Conference. You may have read some of his books, like Purple Cow and Tribes. Inspired by his talk, I applied some of Seth's key points to innovation in communicating health data. If you have data that you want used more broadly, here are some basic thoughts to guide you:
  • Target specific audiences first. A data product cannot appeal to all audiences at once. There is a shrinking "middle of the market" as people get used to having their specific information needs addressed by tailored products. So start with presenting your data in ways that appeal to those small audiences that are most likely to embrace and use your data.
  • Make it easy to talk about your data. People have to be able to discuss your data easily to be able to talk about and recommend them to others. It is important to give potential users a sense that "people like me use these data". Ultimately, word of mouth will lead to broader adoption. 
  • Make your data product "remarkable" (i.e. worth making remarks about). This is Seth Godin's "purple cow". If people see a cow, it's not remarkable. If they were to see a purple cow, they would talk about it (or about Milka, but that's a different story). For health data, that purple cow could be a new form of visualization, a surprising story, a provocative infographic, or a useful smartphone app that people will talk about.
  • Go out on a limb. If you are trying to innovate, you can't avoid failure. If "failure is not an option", you are not innovating. Fail fast and iterate quickly, until you get it right.

To get more inspiration, follow Seth's blog or subscribe to this blog (feed is here); there will be follow-ups to this post in the near future.

Pages