Generalizability and quality control of deep learning-based 2D echocardiography segmentation models in a large clinical dataset.

Journal: The international journal of cardiovascular imaging
Published Date:

Abstract

Use of machine learning (ML) for automated annotation of heart structures from echocardiographic videos is an active research area, but understanding of comparative, generalizable performance among models is lacking. This study aimed to (1) assess the generalizability of five state-of-the-art ML-based echocardiography segmentation models within a large Geisinger clinical dataset, and (2) test the hypothesis that a quality control (QC) method based on segmentation uncertainty can further improve segmentation results. Five models were applied to 47,431 echocardiography studies that were independent from any training samples. Chamber volume and mass from model segmentations were compared to clinically-reported values. The median absolute errors (MAE) in left ventricular (LV) volumes and ejection fraction exhibited by all five models were comparable to reported inter-observer errors (IOE). MAE for left atrial volume and LV mass were similarly favorable to respective IOE for models trained for those tasks. A single model consistently exhibited the lowest MAE in all five clinically-reported measures. We leveraged the tenfold cross-validation training scheme of this best-performing model to quantify segmentation uncertainty. We observed that removing segmentations with high uncertainty from 14 to 71% studies reduced volume/mass MAE by 6-10%. The addition of convexity filters improved specificity, efficiently removing < 10% studies with large MAE (16-40%). In conclusion, five previously published echocardiography segmentation models generalized to a large, independent clinical dataset-segmenting one or multiple cardiac structures with overall accuracy comparable to manual analyses-with variable performance. Convexity-reinforced uncertainty QC efficiently improved segmentation performance and may further facilitate the translation of such models.

Authors

  • Xiaoyan Zhang
    Institute of Information and Navigation, Air Force Engineering University, Xi'an, Shaanxi, China.
  • Alvaro E Ulloa Cerna
    Department of Translational Data Science and Informatics, Geisinger, 100 North Academy Avenue, Danville, PA, 17822, USA.
  • Joshua V Stough
    Department of Translational Data Science and Informatics, Geisinger, Danville, Pennsylvania; Department of Computer Science, Bucknell University, Lewisburg, Pennsylvania.
  • Yida Chen
    Computer Science, Bucknell University, Lewisburg, PA, USA.
  • Brendan J Carry
    Heart Institute, Geisinger, Danville, Pennsylvania.
  • Amro Alsaid
    Heart Institute, Geisinger, Danville, PA, USA.
  • Sushravya Raghunath
    Department of Translational Data Science and Informatics, Geisinger, Danville, PA, USA.
  • David P vanMaanen
    Department of Translational Data Science and Informatics, Geisinger, Danville, PA, USA.
  • Brandon K Fornwalt
    Department of Imaging Science and Innovation, Geisinger, Danville, Pennsylvania; Department of Biomedical Engineering, University of Kentucky, Lexington, Kentucky; Department of Radiology, Geisinger, Danville, Pennsylvania. Electronic address: bkf@gatech.edu.
  • Christopher M Haggerty
    IT Data Science, NewYork-Presbyterian Hospital, New York, New York, USA.