Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Applying deep learning to digital histopathology is hindered by the scarcity of manually annotated datasets. While data augmentation can ameliorate this obstacle, its methods are far from standardized. Our aim was to systematically explore the effects of skipping data augmentation; applying data augmentation to different subsets of the whole dataset (training set, validation set, test set, two of them, or all of them); and applying data augmentation at different time points (before, during, or after dividing the dataset into three subsets). Different combinations of the above possibilities resulted in 11 ways to apply augmentation. The literature contains no such comprehensive systematic comparison of these augmentation ways.

Authors

  • Yusra A Ameen
    Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt. yusra.amin@aun.edu.eg.
  • Dalia M Badary
    Department of Pathology, Faculty of Medicine, Assiut University, Asyut, Egypt.
  • Ahmad Elbadry I Abonnoor
    Urology and Nephrology Hospital, Faculty of Medicine, Assiut University, Asyut, Egypt.
  • Khaled F Hussain
    Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt.
  • Adel A Sewisy
    Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt.