A Scoping Review of AI/ML Algorithm Updating Practices for Model Continuity and Patient Safety Using a Simplified Checklist.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
The ubiquity of clinical artificial intelligence (AI) and machine learning (ML) models necessitates measures to ensure the reliability of model output over time. Previous reviews have highlighted the lack of external validation for most clinical models, but a comprehensive review assessing the current priority given to clinical model updating is lacking. The objective of this study was to analyze studies of clinical AI models based on PRISMA guidelines. Additionally, a new simple checklist/score system was developed and employed to screen the quality of published AI/ML models. The primary aim was to understand the extent to which clinical model updating is prioritized in current research. We conducted a systematic analysis of studies on clinical AI models, adhering to PRISMA guidelines. To assess the quality of the models, we introduced a new checklist/score and considered demographic composition based on ethnicity or race. This comprehensive approach aimed to provide a thorough evaluation of the current landscape of clinical AI models. A comprehensive literature search was conducted using Ovid Embase, Ovid MEDLINE, Ovid PsycINFO, Web of Science Core Collection, Scopus, and the Cochrane Library. Inclusion criteria encompassed AI and ML studies involving clinically predictive or prognostic modeling, human studies with algorithms, articles using supervised learning methods, articles using at least two predictor variables, and studies including randomized controlled trials, prospective and retrospective cohorts, case-control studies, and case-cohort studies. Studies that did not meet these inclusion criteria were excluded. This methodology ensures a thorough and systematic evaluation of clinical AI models. The results of our analysis revealed that only 9% of the reviewed 390 AI/ML studies on sampled models stated an intention or method to update their models in the future. 98% of the AI/ML models in our review were in the research phase, and only 2 % were in the production phase. Furthermore, a mere 12% reported following best practice standards for model development. Notably, 84% of the studies did not provide demographic composition based on ethnicity or race. These findings shed light on the characteristics of recent clinical models and underscore the prevalence of research phase models built on proprietary data, limiting independent verification and validation of model output. In conclusion, our review emphasizes the need for increased attention to the updating of clinical AI models, as a significant portion of studies currently lack commitment to future model updates. The low adherence to best practice standards for model development also highlights areas for improvement in the field. Furthermore, the absence of demographic information in a substantial number of studies raises concerns about the generalizability and equitable application of these models. These findings shed light on the characteristics of recent clinical models and underscore the prevalence of research phase models built on proprietary data, limiting independent verification and validation of model output is also a big concern for patient safety. Addressing these issues is crucial for advancing the reliability and inclusivity of clinical AI and ML applications.