When a black-box algorithm guides a physician’s diagnostic or therapeutic judgments, its intrinsic opaqueness can confound subsequent steps toward clinical safety and efficacy—not to mention shared decisionmaking, informed consent and the very patient-doctor relationship itself.
Conventional low-tech medicine isn’t free of these risks, as the booming malpractice-insurance industry reminds. However, introducing unexplainable AI into the mix can’t help but multiply the points at which an early medical error can spawn a series of downstream fumbles.
Or, as put by European researchers in a paper published Aug. 10 in Bioethics:
[W]hereas in normal medical decision-making, each physician will decide individually using her own judgment, and the impact of mistakes is thus limited, implementation of one single algorithm can corrupt the decision process for multiple users, all of whom will make the same incorrect interpretation.”
Fleshing out the ethical ramifications of this phenomenon are Robin Pierce, JD, PhD, of Tilburg Law School in the Netherlands; Sigrid Sterckx, PhD, of Ghent University in Belgium; and Wim Van Biesen, MD, PhD, of Ghent University Hospital.
They take up the exercise as applied to Van Biesen’s medical specialty, nephrology, in which AI has shown promise to help detect, diagnose and guide care of kidney conditions.
Noting the lack of an agreed-upon nomenclature to define the algorithmic concepts used in and/or produced by nephrological AI, the team suggests the resulting black-box reasoning can lead users astray along one, two or all of three distinct streams.
- A mathematical black box keeps end-users from knowing how the inputs led to the output, while
- a logical black box obscures the reasoning process the algorithm used to arrive at its outputs and
- a semantic black box raises questions about what the algorithm’s output means, including how clinically relevant it is.
The authors concentrate the present discussion mainly on the latter category. In the case of a nephrologist using AI to diagnose acute kidney disease (AKI), they write, the clinician would need to remain mindful of the fact that 95% of studies on AI-aided AKI detection use only one of two key diagnostic criteria.
“As a result, most of these seemingly objective and robust automated AKI detection systems are profoundly opaque black boxes that can generate a positive or negative AKI label for a particular patient with significantly different clinical consequences,” they point out. “These decisions are rarely available to the end-user, so she also cannot know exactly which type of AKI is detected.”
Other steps at which semantic black boxes and opaque AI can combine to muddle medical decisionmaking, according to Pierce and co-authors:
Guideline interpretations. Take the case of a nephrologist who senses an AI recommendation is off because it’s not tailored to a known patient under his or her care with the established guidance, the Kidney Disease Improving Global Outcome (KDIGO) criteria. That physician would be bound both ethically and legally to check all the algorithm’s results on their credibility, the authors note. And should that happen, “the whole purpose of having the automated devices becomes pointless. … While massive big data may fill some of the void, the scale and exacerbated opacity of an AI-generated label of AKI renders the output less clinically helpful in some very disturbing ways.”
Clinical consequences. “[B]ecause of the dichotomous nature of AI outputs [in nephrology]—either AKI or not-AKI—the conditions on the continuum designating ‘at risk’ are essentially being ignored, thus resulting in a missed opportunity for personalized, adapted intervention.”
Doctor-patient dynamic. When AI flags a patient as having AKI but gives scant explanation as to what it’s going by—let alone which complicating factors may be contributing to the call or what diagnostic thresholds the algorithm is using—the patient’s condition itself becomes a sort of black box, the authors write. “For these reasons, the ‘knowledge’ about the patient's condition produced by an automated e-alert is problematic on many levels and can lead to poor decisionmaking by the care team seeking to honor the patient’s goals.”
In the latter scenario, Pierce and colleagues emphasize, much is riding on the moral responsibility of the attending clinician.
“Both beneficence and nonmalevolence mandate that the physician act responsibly regarding the patient's welfare and well-being,” they write. “Advocates of automated detection systems for AKI argue that they require less clinical expertise among staff to produce an AKI alert or diagnosis in a timely manner. This purported benefit is misleading, however. The implementation of AKI alerts is mainly intended to address issues with clinical staff shortages rather than to improve care.”
The team also takes up policy concerns raised by black boxes and unexplainable AI en route to concluding that a more refined and transparent translation of KDIGO criteria might improve nephrology AI—and in ways that could generalize across medicine.
They write:
It is a matter of debate whether the solution to the spectrum of dilemmas, diversions and drawbacks of the semantic black box that is fed into automated algorithmic systems lies in a more refined and transparent translation of KDIGO criteria into the model, adequate regulation and governance, or ensuring appropriate and adequate personnel and processes in the embedding into clinical practice. It is reasonable to suggest that it will take a combination of all of these. … When technology interferes with proper understanding and effective communication and hinders the ability to provide reasonable levels of care, then this must be addressed before wide implementation can be considered.”
The paper is available in full for free.