Early Diabetes Prediction: A Comparative Study Using Machine Learning Techniques.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Most screening tests for Diabetes Mellitus (DM) in use today were developed using electronically collected data from Electronic Health Record (EHR). However, developing and under-developing countries are still struggling to build EHR in their hospitals. Due to the lack of HER data, early screening tools are not available for those countries. This study develops a prediction model for early DM by direct questionnaires for a tertiary hospital in Bangladesh. Information gain technique was used to reduce irreverent features. Using selected variables, we developed logistic regression, support vector machine, K-nearest neighbor, Naïve Bayes, random forest (RF), and neural network models to predict diabetes at an early stage. RF outperformed other machine learning algorithms achieved 100% accuracy. These findings suggest that a combination of simple questionnaires and a machine learning algorithm can be a powerful tool to identify undiagnosed DM patients.

Authors

  • Tahmina Nasrin Poly
    Graduate Institute of Biomedical Informatics, College of Medicine Science and Technology, Taipei Medical University, Taipei, Taiwan; International Center for Health Information Technology(ICHIT), Taipei Medical University, Taipei, Taiwan.
  • Md Mohaimenul Islam
    Graduate Institute of Biomedical Informatics, College of Medicine Science and Technology, Taipei Medical University, Taipei, Taiwan; International Center for Health Information Technology(ICHIT), Taipei Medical University, Taipei, Taiwan.
  • Yu-Chuan Jack Li
    Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan.