Black-box AI should be barred from reading medical images in clinical settings because machine learning, like human thinking, tends to take diagnostic shortcuts—which for basic safety reasons call for an explanation.
A study published May 31 in Nature Machine Intelligence bears this out.
Researchers at the University of Washington in Seattle began their investigation by reviewing the literature to assess datasets and AI models used for diagnosing COVID-19 from chest X-rays.
Su-In Lee, PhD, and colleagues paid special attention to studies using AI approaches they deemed at high probability for “worst-case confounding.”
An example of this effect is tending to assume elderly patients are COVID-positive when they have, say, a fever and sore throat but inconclusive findings on chest imaging.
To uncover such shortcutting, Lee and team first trained deep convolutional neural networks on image datasets resembling those used in the published studies.
Next they tested the models on COVID case mockups representing both single-hospital and multi-institution settings.
“[A] model that relies on valid medical pathology—which should not change between datasets—should maintain high performance,” the authors point out.
Unsurprisingly, the single-site application far outperformed its multisite counterpart.
However, both had evidence of shortcutting.
The worst performance turned up when models were made to synthesize training data from separate datasets.
Such synthesis “introduces near worst-case confounding and thus abundant opportunity for models to learn these [inappropriate] shortcuts,” Lee and co-authors comment.
“Importantly,” they add, “because undesirable ‘shortcuts’ may be consistently detected in both internal and external domains, our results warn that external test set validation alone may be insufficient to detect poorly behaved models.”
The results also buttress the case for using explainable AI—and for now only explainable AI—in use cases across clinical settings, the authors underscore.
In coverage from UW’s news division, Lee says she and her team remain hopeful about AI’s future in imaging-based medical diagnostics.
“I believe we will eventually have reliable ways to prevent AI from learning shortcuts, but it’s going to take some more work to get there,” she says. “Going forward, explainable AI is going to be an essential tool for ensuring these models can be used safely and effectively to augment medical decisionmaking and achieve better outcomes for patients.”
UW’s coverage is here, and the study is available in full for free.