Combining Ultrasound Imaging and Molecular Testing in a Multimodal Deep Learning Model for Risk Stratification of Indeterminate Thyroid Nodules.
Journal:
Thyroid : official journal of the American Thyroid Association
Published Date:
Apr 21, 2025
Abstract
Indeterminate cytology (Bethesda III and IV) represents 15-30% of biopsied thyroid nodules and require additional diagnostic testing. Molecular testing (MT) is a commonly used diagnostic tool that evaluatesmalignancy risk through next generation sequencing of fine needle aspiration (FNA) samples. While MT achieves high sensitivity (97-100%) in ruling out malignancy, its specificity and positive predictive value (PPV) remain relatively low. This study proposes a multimodal deep learning model that integrates ultrasound (US) imaging with MT to improve risk stratification by enhancing PPV while maintaining high sensitivity. Combining these modalities leverages complementary information from both molecular and imaging data, addressing limitations in current approaches and offering a robust framework for evaluating indeterminate nodules. We retrospectively analyzed 333 patients with indeterminate thyroid nodules (259 benign, 74 malignant) at UCLA Medical Center between 2016 and 2022. We evaluated four configurations: whole frame US images, 256 × 256 patches, 128 × 128 patches, and an ensemble model combining the first three configurations. The clinical baseline consisted of Bethesda cytology and MT results. Models were assessed using five fold cross validation stratified by surgical outcomes. The clinical baseline (Bethesda + MT) achieved an AUROC of 0.728 [0.68, 0.78] with sensitivity of 0.946 [0.88, 1.00], specificity of 0.664 [0.60, 0.73], and PPV of 0.448 [0.41, 0.48]. The proposed ensemble model demonstrated improved performance, achieving an AUROC of 0.831 [0.77, 0.89] with a sensitivity of 0.946 [0.88, 1.00], specificity of 0.703 [0.66, 0.75], and PPV of 0.477 [0.46, 0.50]. These improvements were statistically significant ( = 0.0008). Our multimodal model enhances MT performance by providing statistically significant improvements in PPV and specificity while maintaining high sensitivity. Our framework could be leveraged to reduce the number of benign thyroid resections in patients with indeterminate nodules. However, this study is limited by its single center dataset, lack of external validation, and the use of binarized MT outputs rather than granular malignancy risk probabilities. Future work should validate these findings across diverse populations and larger external datasets for more comprehensive risk stratification.