NC3, the NIH-spurred effort to grow a national COVID database from far-flung local siloes, has already yielded an impressive harvest: almost 600 million clinical observations of nearly 6.5 million patients seen at 56 sites.
The effort is ongoing, so those numbers are still ticking up. The data are secure, the patients de-identified and the access controlled. Which begs a question:
Did we really need a pandemic to produce a national healthcare database that’s secure, private and exclusively available to medical researchers?
No matter. We have one now.
MIT Technology Review looks at the encouraging story behind—and in front of—the National COVID Cohort Collaborative.
“It’s just shocking that we had no harmonized, aggregate health data for research in the face of a pandemic,” a researcher who co-leads N3C tells reporter Cat Ferguson. “We never would have gotten everyone to give us this degree of data outside the context of a pandemic, but now that we’ve done it, it’s a demonstration that clinical data can be harmonized and shared broadly in a secure way, and a transparent way.”
The outlet reports that 215 approved projects are feeding in. Some track direct outcomes of COVID infections. Others look for possible ripple effects in things like complication rates in COVID-negative patients who had elective surgery during the public-health crisis.
Several incorporate AI and machine learning.
The first publication from the collaborative, Ferguson reports, was an analysis of mortality risk factors in patients who got sick with COVID on top of cancer.
Of course, any E pluribus unum effort has its work cut out for it, and N3C collaborators will are feeling their way through the project’s particular pain points.
The N3C co-leader tells Ferguson:
There was a certain amount of skepticism from sites, like, ‘We don’t really need this kind of data quality framework—we already do that at our own sites confidentially, behind our firewall. We don’t need your stinking harmonization tools.’ But we learned those quality measures are insufficient when you look at data in aggregate.”