Solving the problem of imbalanced dataset with synthetic image generation for cell classification using deep learning.
Journal:
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Published Date:
Nov 1, 2021
Abstract
The low number of annotated training images and class imbalance in the field of machine learning is a common problem that is faced in many applications. With this paper, we focus on a clinical dataset where cells were extracted in a previous research. Class imbalance can be experienced within this dataset since the normal cells are in a great majority in contrast to the abnormal ones. To address both problems, we present our idea of synthetic image generation using a custom variational autoencoder, that also enables the pretraining of the subsequent classifier network. Our method is compared with a performant solution, as well as presented with different modifications. We have experienced a performance increase of 4.52% regarding the classification of the abnormal cells.Clinical Relevance - We extract images from cervical smears, using digitized Pap test. When working with these kinds of smears, a single one can contain more than 10,000 cells. Examination of these is done manually by going over each cell individually. Our main goal is to make a system that can rank these samples by importance, thus making the process easier and more effective. The research that is described in this paper gets us a step closer to achieving our goal.