Providing access to detailed demographic and health data: Census Research Data Centers
Sharing health data and making them 'open' isn’t easy, one of the key reasons being privacy. You can remove direct identifiers, but detailed other data like treatment dates can make it possible to identify subjects. Quickly growing amounts of information in marketing databases and social media further add to the risk of identification. On the other hand, the more details you remove from a dataset, the less useful it becomes for analysis and research. In the end, it’s about balancing the risk of identification with the usefulness of the data.
In order to make data with lots of detail along with direct and indirect identifiers available to researchers, data owners need to create controlled environments in which researchers can use the data for approved purposes and retrieve results which create little or no risk of identification. The US Census Bureau runs 14 Research Data Centers (RDC) across the US that do just that. The latest one, the Northwest Census Research Data Center (NWCRDC), was opened yesterday at the University of Washington in Seattle by the acting Director of the Census Bureau, Tom Mesenbourg, and the Director of the NWCRDC, Dr. Mark Ellis.
Like the other Census Research Data Centers, the NWCRDC provides access to demographic, economic and health microdata (i.e. respondent level data), including censuses, surveys, administrative data, and health data from the National Center for Health Statistics (NCHS) and the Agency for Healthcare Research and Quality (AHRQ). The datasets go back to the 1970s including the 1970 decennial census, and the Census Bureau is working with the University of Minnesota to make the microdata from the 1960 census available. The data are available to qualified researchers for projects that are reviewed carefully to prevent abuse. However, once approved, researchers can link individuals across datasets in the RDC and even link in own datasets with identifiers. This provides very unique opportunities for research otherwise not possible and is a fantastic resource for researchers in the Pacific Northwest.
Why should data holders consider going this route? For the US Census Bureau, the most important benefit are new estimates and data products, efficiency, expanded measurement capabilities, and improved documentation of their own data. They are tapping into creative and innovative thinkers to find additional uses for the data that contribute to the Census Bureau and the American public. Currently, more than 650 researchers are working on 150 projects across the 14 RDCs.
Researchers that want to use the RDC need to write a proposal about their planned research (more details here), which is reviewed by the US Census Bureau for their scientific merit and benefit for the US Census Bureau and the public. Proposals to use data provided by other agencies like NCHS and AHRQ are reviewed by those organizations. All work has to be conducted at the NWCRDC on campus at the University of Washington.
More data owners or data holders should consider making more detailed data available. Research Data Centers are one possible solution. I'll discuss others in future posts.