Added Value of Deep Learning-based Detection System for Multiple Major Findings on Chest Radiographs: A Randomized Crossover Study.

Journal: Radiology
PMID:

Abstract

Background Previous studies assessing the effects of computer-aided detection on observer performance in the reading of chest radiographs used a sequential reading design that may have biased the results because of reading order or recall bias. Purpose To compare observer performance in detecting and localizing major abnormal findings including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax on chest radiographs without versus with deep learning-based detection (DLD) system assistance in a randomized crossover design. Materials and Methods This study included retrospectively collected normal and abnormal chest radiographs between January 2016 and December 2017 (; registration no. KCT0004147) The radiographs were randomized into two groups, and six observers, including thoracic radiologists, interpreted each radiograph without and with use of a commercially available DLD system by using a crossover design with a washout period. Jackknife alternative free-response receiver operating characteristic (JAFROC) figure of merit (FOM), area under the receiver operating characteristic curve (AUC), sensitivity, specificity, false-positive findings per image, and reading times of observers with and without the DLD system were compared by using McNemar and paired tests. Results A total of 114 normal (mean patient age ± standard deviation, 51 years ± 11; 58 men) and 114 abnormal (mean patient age, 60 years ± 15; 75 men) chest radiographs were evaluated. The radiographs were randomized to two groups: group A ( = 114) and group B ( = 114). Use of the DLD system improved the observers' JAFROC FOM (from 0.90 to 0.95, = .002), AUC (from 0.93 to 0.98, = .002), per-lesion sensitivity (from 83% [822 of 990 lesions] to 89.1% [882 of 990 lesions], = .009), per-image sensitivity (from 80% [548 of 684 radiographs] to 89% [608 of 684 radiographs], = .009), and specificity (from 89.3% [611 of 684 radiographs] to 96.6% [661 of 684 radiographs], = .01) and reduced the reading time (from 10-65 seconds to 6-27 seconds, < .001). The DLD system alone outperformed the pooled observers (JAFROC FOM: 0.96 vs 0.90, respectively, = .007; AUC: 0.98 vs 0.93, = .003). Conclusion Observers including thoracic radiologists showed improved performance in the detection and localization of major abnormal findings on chest radiographs and reduced reading time with use of a deep learning-based detection system. © RSNA, 2021

Authors

  • Jinkyeong Sung
    From the R&D Center, VUNO, 507 Gangnamdae-ro, Seocho-gu, Seoul 06536, South Korea (J.S., W.B., B.P., E.J., K.H.J.); and Department of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, South Korea (S.P., S.M.L., J.B.S.).
  • Sohee Park
    Department of Radiology and Research Institute of Radiology, Asan Medical Center, College of Medicine, University of Ulsan, 88 Olympic-ro 43 Gil, Songpa-gu, Seoul, 138736, South Korea.
  • Sang Min Lee
    Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.
  • Woong Bae
    Soombit.ai, Seoul, South Korea.
  • Beomhee Park
    Kakao, Seoul, South Korea.
  • Eunkyung Jung
    From the R&D Center, VUNO, 507 Gangnamdae-ro, Seocho-gu, Seoul 06536, South Korea (J.S., W.B., B.P., E.J., K.H.J.); and Department of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, South Korea (S.P., S.M.L., J.B.S.).
  • Joon Beom Seo
    Department of Radiology, Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul 05505, Korea.
  • Kyu-Hwan Jung
    VUNO Inc., Seoul, Korea.