Makers of AI models for use in healthcare should think through the potential actions of any “humans in the loop” of their tool’s implementation in real-world clinical settings.
This means AI designers ought to check the interpretability of the product’s outputs, anticipating “the performance of the Human-AI team rather than just the performance of the model in isolation.”
That’s one of 10 tenets FDA presents in new guidelines defining “good machine learning practice” for medical device development.
The agency drew up the 2-page document jointly with Health Canada and the U.K.’s Medicines and Healthcare Regulatory Agency.
“Strong partnerships with our international public health partners will be crucial if we are to empower stakeholders to advance responsible innovations in this area,” FDA states in introducing the pointers.
FDA says it envisions the guiding principles being used to either tailor existing AI practices in healthcare, create new practices or adopt approaches and methods from sectors outside of healthcare.
Here they are:
1. Multidisciplinary expertise is leveraged throughout the total product life cycle. In-depth understanding of a model’s intended integration into clinical workflow, and the desired benefits and associated patient risks, can help ensure that machine learning-enabled medical devices are safe and effective and address clinically meaningful needs over the life cycle of the device.
2. Good software engineering and security practices are implemented. These practices include methodical risk management and design process that can appropriately capture and communicate design, implementation and risk management decisions and rationale as well as ensure data authenticity and integrity.
3. Clinical study participants and datasets are representative of the intended patient population. Data collection protocols should ensure that the relevant characteristics of the intended patient population … and measurement inputs are sufficiently represented in a sample of adequate size in the clinical study and training and test datasets so that results can be reasonably generalized to the population of interest.
4. Training datasets are independent of test sets. All potential sources of dependence, including patient, data acquisition and site factors are considered and addressed to assure independence.
5. Selected reference datasets are based upon best available methods. If available, accepted reference datasets in model development and testing that promote and demonstrate model robustness and generalizability across the intended patient population are used.
6. Model design is tailored to the available data and reflects the intended use of the device. Considerations include the impact of both global and local performance and uncertainty/variability in the device inputs, outputs, intended patient populations and clinical use conditions.
7. Focus is placed on the performance of the human-AI team. Where the model has a “human in the loop,” human factors considerations and the human interpretability of the model outputs are addressed with emphasis on the performance of the Human-AI team, rather than just the performance of the model in isolation.
8. Testing demonstrates device performance during clinically relevant conditions. Considerations include the intended patient population, important subgroups, clinical environment and use by the Human-AI team, measurement inputs and potential confounding factors.
9. Users are provided clear, essential information. Users are also made aware of device modifications and updates from real-world performance monitoring, the basis for decision-making when available, and a means to communicate product concerns to the developer.
10. Deployed models are monitored for performance and retraining risks are managed. [W]hen models are periodically or continually trained after deployment, there are appropriate controls in place to manage risks of overfitting, unintended bias, or degradation of the model … that may impact the safety and performance of the model as it is used by the Human-AI team.
In an accompanying statement, Bakul Patel, director of the FDA’s Digital Health Center of Excellence formed last fall, says the three regulatory bodies hope the guidelines will “help stakeholders to advance device development, which has the potential to significantly improve the quality of patient care and transform healthcare.”
“We recognize that machine learning technologies present unique considerations due to their complexity and the iterative and data-driven nature of their development,” Patel adds. “While this is true, we are excited for continued progress in this area.”
Full guideline document here.