Last week a literature review showed none of 62 high-quality medical AI models ready for translation from academic research to clinical practice. Now comes a similar but separate study confirming the depth of the dashed hopes.
Reporting their findings in Science Translational Medicine, the researchers behind the second exercise found just 23% of healthcare machine-learning studies were reproducible with differing datasets.
By comparison, 80% of computer vision studies and 58% of NLP studies had such conceptual reproducibility.
Equally confounding, 55% of machine learning in healthcare papers used public datasets and made their code available. Computer vision and NLP each clocked in at close to 90% on those scores.
IEEE Spectrum takes a quick look at both literature reviews side by side.
“Healthcare is an especially challenging area for machine learning research because many datasets are restricted due to health privacy concerns and even experts may disagree on a diagnosis for a scan or patient,” writes freelance journalist Megan Scudellari. “Still, researchers are optimistic that the field can do better.”