Mitigating class imbalance in churn prediction with ensemble methods and SMOTE.

Journal: Scientific reports
Published Date:

Abstract

This study examines how imbalanced datasets affect the accuracy of machine learning models, especially in predictive analytics applications such as churn prediction. When datasets are skewed towards the majority class, it can lead to biased model performance, reducing overall effectiveness. To analyze this impact, the research utilizes a churn dataset to evaluate how data imbalance influences model accuracy. The study utilized nine individual classifiers along with six homogeneous ensemble models to evaluate the effects of imbalanced data on model performance. Single classifier models struggle to identify underlying patterns in imbalanced data, while ensembles improve predictive performance by focusing on the minority class. However, when trained on unbalanced data, their accuracy remains subpar. The top six classifiers were selected for further investigation based on their performance on the imbalanced data. A SMOTE sampling technique was employed to create a balanced dataset, ensuring that all classes were adequately represented. The generated model's performance improved from 61 to 79%, indicating the removal of bias in the target data. The results showed that Adaboost, an optimal classifier, demonstrated superior performance with an F1-Score of 87.6% in identifying potential churn and assessing customer account health. The findings emphasize the importance of balanced datasets for accurate ML model predictions.

Authors

  • R Suguna
    Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India.
  • J Suriya Prakash
    Department of Computer Science and Engineering, JAIN (Deemed-to-be-University), Bangaluru, Karnataka, India.
  • H Aditya Pai
    Department of CSE, MIT School of Computing, MIT Art, Design and Technology University, Pune, 412201, India.
  • T R Mahesh
    Department of Computer Science and Engineering, JAIN (Deemed-to-be-University), Bangaluru, Karnataka, India.
  • Venkatesan Vinoth Kumar
    School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, 632001, India.
  • Temesgen Engida Yimer
    Department of Mathematics, Dilla University, Dilla, Ethiopia. Temesgen.engida@du.edu.et.

Keywords

No keywords available for this article.