Optimizing Artificial Intelligence Thresholds for Mammographic Lesion Detection: A Retrospective Study on Diagnostic Performance and Radiologist-Artificial Intelligence Discordance.
Journal:
Diagnostics (Basel, Switzerland)
Published Date:
May 29, 2025
Abstract
Artificial intelligence (AI)-based systems are increasingly being used to assist radiologists in detecting breast cancer on mammograms. However, applying fixed AI score thresholds across diverse lesion types may compromise diagnostic performance, especially in women with dense breasts. This study aimed to determine optimal category-specific AI thresholds and to analyze discrepancies between AI predictions and radiologist assessments, particularly for BI-RADS 4A versus 4B/4C lesions. : We retrospectively analyzed 194 mammograms (76 BI-RADS 4A and 118 BI-RADS 4B/4C) using FDA-approved AI software. Lesion characteristics, breast density, AI scores, and pathology results were collected. A receiver operating characteristic (ROC) analysis was conducted to determine the optimal thresholds via Youden's index. Discrepancy analysis focused on BI-RADS 4A lesions with AI scores of ≥35 and BI-RADS 4B/4C lesions with AI scores of <35. : AI scores were significantly higher in malignant versus benign cases (72.1 vs. 20.9; < 0.001). The optimal AI threshold was 19 for BI-RADS 4A (AUC = 0.685) and 63 for BI-RADS 4B/4C (AUC = 0.908). In discordant cases, BI-RADS 4A lesions with scores of ≥35 had a malignancy rate of 43.8%, while BI-RADS 4B/4C lesions with scores of <35 had a malignancy rate of 19.5%. : Using category-specific AI thresholds improves diagnostic accuracy and supports radiologist decision-making. However, limitations persist in BI-RADS 4A cases with overlapping scores, reinforcing the need for radiologist oversight and tailored AI integration strategies in clinical practice.
Authors
Keywords
No keywords available for this article.