Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy.

Journal: Scientific reports

PMID: 40251253

Abstract

This primary research paper emphasizes cross-validation, where data samples are reshuffled in each iteration to form randomized subsets divided into n folds. This method improves model performance and achieves higher accuracy than the baseline model. The novelty lies in the data preparation process, where numerical features were imputed using the mean, categorical features were imputed using chi-square methods, and normalization was applied. This research study involves transforming the original datasets and comparative model analysis of four Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) cross-validation methodologies to heart disease open datasets. The objective is to easily identify the average accuracy of model predictions and subsequently make recommendations for model selection based on data preprocessing cross-validation model increased (5 to 14%) more than baseline model for best model selection. From comparing each model's accuracy scores, it is found that the logistic regression and k-nearest neighbor models achieved the highest accuracy of 81% among the four models when single accuracy is a concern. However, the random forest model summary statistics attained an F1 score of 95%, precision (96%), and recall (97%), indicating the highest overall macro accuracy score. These findings can be further compared using learning curve validation. Conversely, the logistic regression model exhibited the lowest accuracy of 84% among the four machine learning models. However, this research does not cover hyperparameter optimization, which could potentially improve model performance.

Authors

Yagyanath Rimal

IIS (Deemed to be University), Jaipur, India. rimal.yagya@gmail.com.
Navneet Sharma

IIS (Deemed to be University), Jaipur, India.
Siddhartha Paudel

IOE, Pulchowk Campus, Patan, Nepal.
Abeer Alsadoon

School of Computing and Mathematics, Charles Sturt University, Sydney Campus, Sydney, Australia. aalsadoon@studygroup.com.
Madhav Parsad Koirala

Pokhara University, Pokhara, Nepal.
Sumeet Gill

Maharshi Dayanand University, Rohtak, India.

Keywords

Algorithms Heart Diseases Humans Logistic Models Machine Learning Random Forest Reproducibility of Results Support Vector Machine

External Resources

View on PubMed Access via DOI PubMed (40251253)

Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals