Skip directly to content

ghdx

Improving data discoverability - an important step towards open health data?

on Fri, 03/25/2011 - 14:57

In global public health, open data is still in its infancy. Finding health-related data - and even information about those data - is a continuous challenge. The presentations and discussions at the recent Global Health Metrics and Evaluations (GHME) conference showed again that essential data are often not available; other participants like Karen Grepin and Amanda Makulec also made note of that. Of course, relevant data are often simply not collected. In many cases, however, data are not being made available (see some thoughts on that in my post on the Global Health Data Charter). Knowing more thoroughly what data exist, where they live, and what exactly they contain can help increase the availability of data for health analysis. (read on below the presentation)

These are the slides from my launch presentation for the Global Health Data Exchange (GHDx) at GHME. The current primary objective for our new data catalog is to improve data discoverability. Finding existing data currently is a very labor intensive process. Some health indicators are available from sites like the WHO Global Health Observatory or World Bank Open Data. Data repositories like IPUMS, IHSN's data catalog, SodaPOP, or Dataverse provide good starting points for certain types of underlying data. Open government intiatives like HealthData.gov, data.gov.uk, or opengov.se are starting to be good sources of data for selected countries. However, most data are mentioned or available on a variety of websites like ministries of health, statistics bureaus and other organizations (I wrote about health data sources recently). Discovering data from those different sources has to be done via web searches, as well as browsing websites, searching library catalogs, and conducting literature reviews.

Cataloging datasets is currently the most direct path to discoverability, and the GHDx is aimed at providing this path. But it is very labor intensive. Once a dataset is discovered, titles, covered geographies and dates, and other metadata need to be further researched and validated manually. Only certain tasks can be automated (like assigning keywords) or at least supported by software (like automated searches, web scraping, etc.). And we will explore how to collect and update some of the information via crowdsourcing.

With a catalog like the GHDx in place to provide accurate and reliable information about data, it becomes straightforward to find data, download them directly or contact data owners, and use them for analysis. It will also show gaps in data collection and illuminate where data are systematically not being shared. In addition, it provides insight into what data are most needed and used, enables proper credit to data collectors and owners, allows data owners to find new audiences for their data, and shows the usefulness and impact of additional data availability. This in turn will hopefully motivate data owners to embark on the path towards open data (ideally completely open '5-star' linked data') for global health and healthcare.

IHME's Global Health Data Exchange (GHDx) is live

on Thu, 03/03/2011 - 06:38

I am very excited to share that IHME's new data catalog, the GHDx, is live. For the past year, I have worked with a team at IHME to develop this catalog of health-related datasets. It's been an exciting process and a great learning experience (more on resulting insights in a later post). The GHDx will be launched officially on March 14 at the Global Health Metrics and Evaluations (GHME) conference, co-hosted by IHME in Seattle. At the launch session, I will provide more background, and we will demo the GHDx and share another exciting announcement together with the CDC. We will also demo the GHDx at the IHME Open House on March 15, 5.30-8.30pm. Join us.

The GHDx offers two key features for people working with population health data

  • It's a catalog of demographic, public health and global health datasets that provides essential information about the data (including who produced the data, where to obtain them, and how to cite them). If you have searched for data before, you know how hard it is to find reliable information even about the existence of data. We have invested a lot of time and effort to research and verify meta-data in the GHDx. And the catalog will grow substantially over the next few months.
  • It's a platform that offers data owners a way to share their data with a larger audience. You'll find IHME's research results there, along with other datasets in the public domain. 

We also use the platform to manage the data we work with at IHME, make them searchable and capture information about them.
Have a look:

  • Get more background on GHDx and the data from the inaugural post on the GHDx blog
  • Subscribe to RSS feeds, e.g. for the GHDx blog (where we'll share news about the GHDx and health-related data), newly added datasets, and many others
  • Search, browse and explore data from the GHDx home page (here or via www.ghdx.org)
  • Send the whole IHME Data Team feedback via the contact form

I am interested in your informal view of the GHDx, tips and recommendations, rants and raves. Just add them below to the comments.