a2z-1 for Multi-Disease Detection in Abdomen-Pelvis CT: External Validation and Performance Analysis Across 21 Conditions
Journal:
arXiv
Published Date:
Dec 17, 2024
Abstract
We present a comprehensive evaluation of a2z-1, an artificial intelligence
(AI) model designed to analyze abdomen-pelvis CT scans for 21 time-sensitive
and actionable findings. Our study focuses on rigorous assessment of the
model's performance and generalizability. Large-scale retrospective analysis
demonstrates an average AUC of 0.931 across 21 conditions. External validation
across two distinct health systems confirms consistent performance (AUC 0.923),
establishing generalizability to different evaluation scenarios, with notable
performance in critical findings such as small bowel obstruction (AUC 0.958)
and acute pancreatitis (AUC 0.961). Subgroup analysis shows consistent accuracy
across patient sex, age groups, and varied imaging protocols, including
different slice thicknesses and contrast administration types. Comparison of
high-confidence model outputs to radiologist reports reveals instances where
a2z-1 identified overlooked findings, suggesting potential for quality
assurance applications.