Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning.

Journal: IEEE journal of biomedical and health informatics
PMID:

Abstract

The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.

Authors

  • Jiaxi Li
    Department of Clinical Laboratory Medicine, Jinniu Maternity and Child Health Hospital of Chengdu, Chengdu, China.
  • Zhelong Wang
  • Lina Wu
    Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Shenyang, China.
  • Sen Qiu
  • Hongyu Zhao
    SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China; Department of Biostatistics, Yale University, New Heaven, USA.
  • Fang Lin
    State Key Laboratory of Reliability and Intelligence of Electrical Equipment, School of Electrical Engineering, Hebei University of Technology, Tianjin 300132, P.R.China;Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability of Hebei Province, School of Electrical Engineering, Hebei University of Technology, Tianjin 300132, P.R.China.
  • Ke Zhang
    Center for Radiation Oncology, Affiliated Hangzhou Cancer Hospital, Zhejiang University School of Medicine, Hangzhou 310001, China.