Addressing imbalance in health data: Synthetic minority oversampling using deep learning.

Journal: Computers in biology and medicine
Published Date:

Abstract

Class imbalances in healthcare data, characterized by a disproportionate number of positive cases compared to negative ones, can lead to biased machine learning models that favor the majority class. Ensuring good performance across all classes is crucial for improving healthcare delivery and patient safety. Traditional oversampling methods like SMOTE and its variants face several limitations: they struggle with capturing complex data distributions, handling heterogeneous data types, and natively supporting multi-class datasets. To address these issues, we propose a deep learning based solution using an Auxiliary-guided Conditional Variational Autoencoder (ACVAE) enhanced with contrastive learning. Additionally, we introduce an ensemble technique where ACVAE creates synthetic positive samples, followed by the use of the Edited Centroid-Displacement Nearest Neighbor (ECDNN) algorithm to reduce the majority class. This combined approach takes advantage of ACVAE's ability to produce diverse oversampled data and ECDNN's skill in handling noise through selective undersampling, leading to a more balanced and informative dataset. Our experiments on 12 different health datasets show the effectiveness of our method. We conduct a thorough evaluation of our approach against traditional oversampling techniques and several benchmark machine learning models. The results demonstrate notable improvements in model performance across various metrics, highlighting the potential of deep learning based synthetic oversampling to address class imbalances in healthcare data.

Authors

  • Alex X Wang
    School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6012, New Zealand. Electronic address: alex.wang@vuw.ac.nz.
  • Viet-Tuan Le
    Ho Chi Minh City Open University, 35-37 Ho Hao Hon Street, Ward Co Giang, District 1, Ho Chi Minh City, Vietnam.
  • Hau Nguyen Trung
    Faculty of Information Technology, Ho Chi Minh City Open University, 97 Vo Van Tan, District 3, Ho Chi Minh City 70000, Viet Nam. Electronic address: hau.nt@ou.edu.vn.
  • Binh P Nguyen
    School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.