Finding a Suitable Class Distribution for Building Histological Images Datasets Used in Deep Model Training-The Case of Cancer Detection.

Journal: Journal of digital imaging

Published Date: Apr 20, 2022

Abstract

The class distribution of a training dataset is an important factor which influences the performance of a deep learning-based system. Understanding the optimal class distribution is therefore crucial when building a new training set which may be costly to annotate. This is the case for histological images used in cancer diagnosis where image annotation requires domain experts. In this paper, we tackle the problem of finding the optimal class distribution of a training set to be able to train an optimal model that detects cancer in histological images. We formulate several hypotheses which are then tested in scores of experiments with hundreds of trials. The experiments have been designed to account for both segmentation and classification frameworks with various class distributions in the training set, such as natural, balanced, over-represented cancer, and over-represented non-cancer. In the case of cancer detection, the experiments show several important results: (a) the natural class distribution produces more accurate results than the artificially generated balanced distribution; (b) the over-representation of non-cancer/negative classes (healthy tissue and/or background classes) compared to cancer/positive classes reduces the number of samples which are falsely predicted as cancer (false positive); (c) the least expensive to annotate non-ROI (non-region-of-interest) data can be useful in compensating for the performance loss in the system due to a shortage of expensive to annotate ROI data; (d) the multi-label examples are more useful than the single-label ones to train a segmentation model; and (e) when the classification model is tuned with a balanced validation set, it is less affected than the segmentation model by the class distribution of the training set.

Authors

Ismat Ara Reshma

IRIT, UMR5505 CNRS, Université de Toulouse, Toulouse, France. Ismat-Ara.Reshma@irit.fr.
Camille Franchet

Department of Pathology, University Cancer Institute of Toulouse-Oncopole, Toulouse, France.
Margot Gaspard

Department of Pathology, University Cancer Institute of Toulouse-Oncopole, Toulouse, France.
Radu Tudor Ionescu

Department of Computer Science, University of Bucharest, 14 Academiei, 010014 Bucharest, Romania.
Josiane Mothe

IRIT, UMR5505 CNRS, Université de Toulouse, Toulouse, France.
Sylvain Cussat-Blanc

Institute of Advanced Technologies in Living Sciences (ITAV), CNRS - USR3505, Toulouse, France.
Hervé Luga

IRIT, UMR5505 CNRS, Université de Toulouse, Toulouse, France.
Pierre Brousset

Department of Pathology, University Cancer Institute of Toulouse-Oncopole, Toulouse, France.

Keywords

Deep Learning Humans Image Processing, Computer-Assisted Neoplasms

External Resources

View on PubMed Access via DOI PubMed (35445341)

Finding a Suitable Class Distribution for Building Histological Images Datasets Used in Deep Model Training-The Case of Cancer Detection.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals