Classification of disease subtypes in medical spectroscopy for multi-category sample imbalances based on grouping and hierarchical learning.

Journal: Lasers in medical science
PMID:

Abstract

In the medical domain, the challenges in sample acquisition and collection often result in imbalanced training sets for multi-class models, especially in disease subtype differentiation. We propose a novel method to address multi-class imbalance in serum Raman spectroscopy data for disease subtyping. We address multi-class imbalance by grouping samples according to their noise levels and employing hierarchical incremental learning, which balances the training data and mitigates the noise introduced by augmentation, thus improving the model's accuracy in distinguishing similar disease subtypes. We collected imbalanced serum Raman spectroscopy data from two hepatitis subtypes and a control group, comparing the performance of Convolutional Neural Network (CNN) and Random Forest (RF) models using both original and augmented data, where the augmented data was identical to the training data used in our model. The results show that the proposed method effectively subtypes similar disease subtypes under sample imbalance, particularly for those with limited sample sizes. Our approach achieves an accuracy and F1 score both exceeding 95% on the hepatitis data. However, its broader applicability and potential will require further investigation and validation. All the code is available at https://github.com/RuiGao-1223/GHIL .

Authors

  • Rui Gao
    School of Control Science and Engineering, Shandong University, Jinan, China.
  • Zishuo Chen
    College of Software, Xinjiang University, Urumqi, Xinjiang, 830046, China.
  • Zilong Shao
    College of Information Science and Engineering, Xinjiang University, Urumqi, China.
  • Feng Li
    Department of General Surgery, Shanghai Traditional Chinese Medicine (TCM)-INTEGRATED Hospital of Shanghai University of Traditional Chinese Medicine, Shanghai, China.
  • Yunxi Jin
    College of Information Science and Engineering, Xinjiang University, Urumqi, China.
  • You Xue
    College of Information Science and Engineering, Xinjiang University, Urumqi, China.
  • Heng Wu
    First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China.
  • Xiaoyi Lv
    College of Information Science and Engineering, Xinjiang University, Urumqi, China.
  • Cheng Chen
    Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China.