Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique.

Journal: Scientific reports
PMID:

Abstract

This paper addresses the pressing issue of diabetes, which is a widespread condition affecting a huge population worldwide. As cells become less responsive to insulin or fail to produce it adequately, blood sugar levels rise. This has the potential to cause severe health complications including kidney disease, vision impairment and heart conditions. Early diagnosis is paramount in mitigating the risk and severity of diabetes-related complications. To tackle this, we proposed a robust framework for diabetes prediction using Synthetic Minority Over-sampling Technique (SMOTE) with ensemble machine learning techniques. Our approach incorporates strategies such as imputation of missing values, outlier rejection, feature selection using correlation analysis and class distribution balancing using SMOTE. The extensive experimentation shows that the proposed combination of AdaBoost and XGBoost shows exceptional performance, with an impressive AUC of 0.968+/-0.015. This outperforms not only alternative methodologies presented in our study but also surpasses current state-of-the-art results. We anticipate that our model will significantly improve diabetes prediction, offering a promising avenue for improved healthcare outcomes in diabetes management.

Authors

  • Pradeepa Sampath
    Department of Information Technology, School of Computing, SASTRA Deemed University, Thanjavur, 613401, Tamilnadu, India.
  • Gurupriya Elangovan
    Department of Computer Science with specialization in Artificial Intelligence and Data Science, School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India.
  • Kaaveya Ravichandran
    Department of Computer Science with specialization in Artificial Intelligence and Data Science, School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India.
  • Vimal Shanmuganathan
    Centre of Excellence in Data Science, Department of Artificial Intelligence and Data Science, Sri Eshwar College of Engineering, Coimbatore, Tamilnadu, India. svimalphd@gmail.com.
  • Subbulakshmi Pasupathi
    School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India.
  • Tulika Chakrabarti
    Sir Padampat Singhania University, Udaipur, Rajasthan, India.
  • Prasun Chakrabarti
    Deputy Provost, ITM SLS Baroda University, Vadodara, India.
  • Martin Margala
    School of Computing and Informatics, University of Louisiana, Lafayette, USA.