Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology.

Journal: Medical image analysis

Published Date: May 4, 2022

Abstract

Artificial intelligence (AI) can extract visual information from histopathological slides and yield biological insight and clinical biomarkers. Whole slide images are cut into thousands of tiles and classification problems are often weakly-supervised: the ground truth is only known for the slide, not for every single tile. In classical weakly-supervised analysis pipelines, all tiles inherit the slide label while in multiple-instance learning (MIL), only bags of tiles inherit the label. However, it is still unclear how these widely used but markedly different approaches perform relative to each other. We implemented and systematically compared six methods in six clinically relevant end-to-end prediction tasks using data from N=2980 patients for training with rigorous external validation. We tested three classical weakly-supervised approaches with convolutional neural networks and vision transformers (ViT) and three MIL-based approaches with and without an additional attention module. Our results empirically demonstrate that histological tumor subtyping of renal cell carcinoma is an easy task in which all approaches achieve an area under the receiver operating curve (AUROC) of above 0.9. In contrast, we report significant performance differences for clinically relevant tasks of mutation prediction in colorectal, gastric, and bladder cancer. In these mutation prediction tasks, classical weakly-supervised workflows outperformed MIL-based weakly-supervised methods for mutation prediction, which is surprising given their simplicity. This shows that new end-to-end image analysis pipelines in computational pathology should be compared to classical weakly-supervised methods. Also, these findings motivate the development of new methods which combine the elegant assumptions of MIL with the empirically observed higher performance of classical weakly-supervised approaches. We make all source codes publicly available at https://github.com/KatherLab/HIA, allowing easy application of all methods to any similar task.

Authors

Narmin Ghaffari Laleh

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Hannah Sophie Muti

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Chiara Maria Lavinia Loeffler

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Amelie Echle

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Oliver Lester Saldanha

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Faisal Mahmood

Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
Ming Y Lu

Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Christian Trautwein

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Rupert Langer

Institute of Pathology, Inselspital, University of Bern, Switzerland; Institute of Pathology and Molecular Pathology, Kepler University Hospital, Johannes Kepler University Linz, Linz, Austria.
Bastian Dislich

Institute of Pathology, Inselspital, University of Bern, Switzerland.
Roman D Buelow

Institute of Pathology, University Hospital RWTH Aachen, Aachen, Germany.
Heike Irmgard Grabsch

Department of Pathology, GROW School for Oncology and Developmental Biology, Maastricht University Medical Center+, Maastricht, The Netherlands; Pathology and Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.
Hermann Brenner

German Cancer Consortium (DKTK), Heidelberg, Germany.
Jenny Chang-Claude

Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Elizabeth Alwers

Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany.
Titus J Brinker

National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany.
Firas Khader

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
Daniel Truhn

Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany (J.S., D.B.A., S.N.); Institute of Computer Vision and Imaging, RWTH University Aachen, Pauwelsstrasse 30, 52072 Aachen, Germany (J.S., D.M.); Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (D.T., M.P., F.M., C.K., S.N.); and Faculty of Mathematics and Natural Sciences, Institute of Informatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany (S.C.).
Nadine T Gaisa

Institute of Pathology, University Hospital RWTH Aachen, Aachen, Germany.
Peter Boor

Institute of Pathology, University Hospital Aachen, RWTH Aachen University, Aachen, Germany.
Michael Hoffmeister

Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Volkmar Schulz

Physics of Molecular Imaging Systems, Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany. schulz@pmi.rwth-aachen.de.
Jakob Nikolas Kather

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.

Keywords

Artificial Intelligence Benchmarking Deep Learning Humans Neural Networks, Computer Supervised Machine Learning

External Resources

View on PubMed Access via DOI PubMed (35588568)

Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals