Classification of disease subtypes in medical spectroscopy for multi-category sample imbalances based on grouping and hierarchical learning.
Journal:
Lasers in medical science
PMID:
40387978
Abstract
In the medical domain, the challenges in sample acquisition and collection often result in imbalanced training sets for multi-class models, especially in disease subtype differentiation. We propose a novel method to address multi-class imbalance in serum Raman spectroscopy data for disease subtyping. We address multi-class imbalance by grouping samples according to their noise levels and employing hierarchical incremental learning, which balances the training data and mitigates the noise introduced by augmentation, thus improving the model's accuracy in distinguishing similar disease subtypes. We collected imbalanced serum Raman spectroscopy data from two hepatitis subtypes and a control group, comparing the performance of Convolutional Neural Network (CNN) and Random Forest (RF) models using both original and augmented data, where the augmented data was identical to the training data used in our model. The results show that the proposed method effectively subtypes similar disease subtypes under sample imbalance, particularly for those with limited sample sizes. Our approach achieves an accuracy and F1 score both exceeding 95% on the hepatitis data. However, its broader applicability and potential will require further investigation and validation. All the code is available at https://github.com/RuiGao-1223/GHIL .