Spatiotemporal Deep Learning-Based Cine Loop Quality Filter for Handheld Point-of-Care Echocardiography.

Journal: IEEE transactions on ultrasonics, ferroelectrics, and frequency control
PMID:

Abstract

The reliability of automated image interpretation of point-of-care (POC) echocardiography scans depends on the quality of the acquired ultrasound data. This work reports on the development and validation of spatiotemporal deep learning models to assess the suitability of input ultrasound cine loops collected using a handheld echocardiography device for processing by an automated quantification algorithm (e.g., ejection fraction (EF) estimation). POC echocardiograms ( DICOM cine loops from 175 patients) from two sites were collected using a handheld ultrasound device and annotated for image quality at the frame level. Attributes of high-quality frames for left ventricular (LV) quantification included a temporally stable LV, reasonable coverage of LV borders, and good contrast between the borders and chamber. Attributes of low-quality frames included temporal instability of the LV and/or imaging artifacts (e.g., lack of contrast, haze, reverberation, and acoustic shadowing). Three different neural network architectures were investigated: 1) frame-level convolutional neural network (CNN) which operates on individual echo frames (VectorCNN); 2) single-stream sequence-level CNN which operates on a sequence of echo frames [VectorCNN + long short-term memory (LSTM)]; and 3) two-stream sequence-level CNNs which operate on a sequence of echo and optical flow frames (VectorCNN + LSTM + Average, VectorCNN + LSTM + MinMax, and VectorCNN + LSTM + ConvPool). Evaluation on a sequestered test dataset containing 76 DICOM cine loops with 16 914 frames showed that VectorCNN + LSTM can effectively utilize both spatial and temporal information to regress the quality of an input frame (accuracy: 0.925, sensitivity =0.860, and specificity =0.952), compared to the frame-level VectorCNN that only utilizes spatial information in that frame (accuracy: 0.903, sensitivity =0.791, and specificity =0.949). Furthermore, an independent sample t-test indicated that the cine loops classified to be of adequate quality by the VectorCNN + LSTM model had a statistically significant lower bias in the automatically estimated EF (mean bias % %, versus a clinically obtained reference EF) compared to the loops classified as inadequate (mean bias % %) ( ). Thus, cine loop stratification using the proposed spatiotemporal CNN model improves the reliability of automated POC echocardiography image interpretation.

Authors

  • Rashid Al Mukaddim
  • Emily MacKay
  • Nils Gessert
    Hamburg University of Technology, Schwarzenbergstraße 95 21073, Hamburg. Electronic address: mfbeg@sfu.ca.
  • Ramon Erkamp
    Philips Research, North America, Cambridge, MA.
  • Shriram Sethuraman
  • Jonathan Sutton
    Philips Research, North America, Cambridge, MA.
  • Shyam Bharat
    Philips Research, North America, Cambridge, MA.
  • Melanie Jutras
  • Cristiana Baloescu
  • Christopher L Moore
    Department of Emergency Medicine, Yale University School of Medicine, New Haven CT, United States of America.
  • Balasundar I Raju