Clinical assessment of deep learning-based uncertainty maps in lung cancer segmentation.

Journal: Physics in medicine and biology
Published Date:

Abstract

. Prior to radiation therapy planning, accurate delineation of gross tumour volume (GTVs) and organs at risk (OARs) is crucial. In the current clinical practice, tumour delineation is performed manually by radiation oncologists, which is time-consuming and prone to large inter-observer variability. With the advent of deep learning (DL) models, automated contouring has become possible, speeding up procedures and assisting clinicians. However, these tools are currently used in the clinic mostly for contouring OARs, since these systems are not reliable yet for contouring GTVs. To improve the reliability of these systems, researchers have started exploring the topic of probabilistic neural networks. However, there is still limited knowledge of the practical implementation of such networks in real clinical settings.. In this work, we developed a 3D probabilistic system that generates DL-based uncertainty maps for lung cancer CT segmentations. We employed the Monte Carlo (MC) dropout technique to generate probabilistic and uncertainty maps, while the model calibration was evaluated by using reliability diagrams. A clinical validation was conducted in collaboration with a radiation oncologist to qualitatively assess the value of the uncertainty estimates. We also proposed two novel metrics, namely mean uncertainty (MU) and relative uncertainty volume (RUV), as potential indicators for clinicians to assess the need for independent visual checks of the DL-based segmentation. Our study showed that uncertainty mapping effectively identified cases of under or over-contouring. Although the overconfidence of the model, a strong correlation was observed between the clinical opinion and MU metric. Moreover, both MU and RUV revealed high AUC values in discretising between low and high uncertainty cases.. Our study is one of the first attempts to clinically validate uncertainty estimates in DL-based contouring. The two proposed metrics exhibited promising potential as indicators for clinicians to independently assess the quality of tumour delineation.

Authors

  • Federica Carmen Maruccio
    Philips Research, HTC 34, North Brabant, 5656 AE, NL, The Netherlands.
  • Wietse Eppinga
    University Medical Centre Utrecht, Department of Radiotherapy, Heidelberglaan 100 Utrecht, 3584 CX, NL, The Netherlands.
  • Max-Heinrich Laves
    Leibniz Universität Hannover, Appelstraße 11A, 30167, Hannover, Germany. laves@imes.uni-hannover.de.
  • Roger Fonolla Navarro
    Philips Research, HTC 34, North Brabant, 5656 AE, NL, The Netherlands.
  • Massimo Salvi
  • Filippo Molinari
    Department of Electronics and Telecommunications, Politecnico di Torino, Italy.
  • Pavlos Papaconstadopoulos
    Philips Research, HTC 34, North Brabant, 5656 AE, NL, The Netherlands.