Medical slice transformer for improved diagnosis and explainability on 3D medical images with DINOv2.

Journal: Scientific reports
Published Date:

Abstract

Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are essential clinical cross-sectional imaging techniques for diagnosing complex conditions. However, large 3D datasets with annotations for deep learning are scarce. While methods like DINOv2 are encouraging for 2D image analysis, these methods have not been applied to 3D medical images. Furthermore, deep learning models often lack explainability due to their "black-box" nature. This study aims to extend 2D self-supervised models, specifically DINOv2, to 3D medical imaging while evaluating their potential for explainable outcomes. We introduce the Medical Slice Transformer (MST) framework to adapt 2D self-supervised models for 3D medical image analysis. MST combines a Transformer architecture with a 2D feature extractor, i.e., DINOv2. We evaluate its diagnostic performance against a 3D convolutional neural network (3D ResNet) across three clinical datasets: breast MRI (651 patients), chest CT (722 patients), and knee MRI (1199 patients). Both methods were tested for diagnosing breast cancer, predicting lung nodule dignity, and detecting meniscus tears. Diagnostic performance was assessed by calculating the Area Under the Receiver Operating Characteristic Curve (AUC). Explainability was evaluated through a radiologist's qualitative comparison of saliency maps based on slice and lesion correctness. P-values were calculated using Delong's test. MST achieved higher AUC values compared to ResNet across all three datasets: breast (0.94 ± 0.01 vs. 0.91 ± 0.02, P = 0.02), chest (0.95 ± 0.01 vs. 0.92 ± 0.02, P = 0.13), and knee (0.85 ± 0.04 vs. 0.69 ± 0.05, P = 0.001). Saliency maps were consistently more precise and anatomically correct for MST than for ResNet. Self-supervised 2D models like DINOv2 can be effectively adapted for 3D medical imaging using MST, offering enhanced diagnostic accuracy and explainability compared to convolutional neural networks.

Authors

  • Gustav Müller-Franzes
    From the Department of Diagnostic and Interventional Radiology (F.K., G.M.F., L.H., P.S., S.K., E.B., M.S.H., F.P., M.Z., C.K., P.B., S.N., D.T.), Department of Medicine III (J.K., K.H.), and Clinic for Surgical Intensive Medicine and Intermediate Care (G.M.), University Hospital Aachen, Pauwelsstrasse 30, 52064 Aachen, Germany; Physics of Molecular Imaging Systems, Experimental Molecular Imaging (T.H., V.S.), and Institute of Imaging and Computer Vision (J.S.), RWTH Aachen University, Aachen, Germany; Department of Inner Medicine, Luisenhospital Aachen, Aachen, Germany (L.N.); and Ocumeda AG, Erlen, Switzerland (C.H.).
  • Firas Khader
    Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
  • Robert Siepmann
    Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
  • Tianyu Han
    Physics of Molecular Imaging Systems, Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany. tianyu.han@pmi.rwth-aachen.de.
  • Jakob Nikolas Kather
    Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
  • Sven Nebelung
    Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany (J.S., D.B.A., S.N.); Institute of Computer Vision and Imaging, RWTH University Aachen, Pauwelsstrasse 30, 52072 Aachen, Germany (J.S., D.M.); Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (D.T., M.P., F.M., C.K., S.N.); and Faculty of Mathematics and Natural Sciences, Institute of Informatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany (S.C.).
  • Daniel Truhn
    Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany (J.S., D.B.A., S.N.); Institute of Computer Vision and Imaging, RWTH University Aachen, Pauwelsstrasse 30, 52072 Aachen, Germany (J.S., D.M.); Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (D.T., M.P., F.M., C.K., S.N.); and Faculty of Mathematics and Natural Sciences, Institute of Informatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany (S.C.).