Optimizing ensemble machine learning models for accurate liver disease prediction in healthcare.

Journal: PloS one
Published Date:

Abstract

Liver disease encompasses a range of conditions affecting the liver, including hepatitis, cirrhosis, fatty liver, and liver cancer. It can be caused by infections, alcohol abuse, obesity, or genetic factors, and it often progresses silently until advanced stages. Early detection and lifestyle adjustments are essential for effective management and to prevent severe liver damage. This study explores the application of machine learning (ML) techniques to predict liver disease, leveraging a dataset to compare the performance of several ensemble classifiers. The algorithms include the Random Forrest Classifier, Ada Boost Classifier, and Gradient Boosting Classifier. After a series of feature extraction and selection, hyperparameter tuning by Randomized Search CV and GridSearchCV, we aimed to determine the best model for liver disease prediction in terms of accuracy, precision, recall, and F1-score. The results showed that the Random Forest Classifier, optimized with GridSearchCV, achieved the highest accuracy at just over 85.17%. The considerations presented in this classifier can be considered for potential use as a precise diagnostic tool for liver disease diagnostics as these measurements indicate that this classifier works balanced with precision at 0.85 for both the presence and absence of the given disease as well as recall of 0.81 for its presence and 0.87 for its absence and F1-measure of 0.83 and 0.85 respectively. There were also relatively high performances of AdaBoost Classifier and Gradient Boosting Classifier, though none of the classifiers outperformed Random Forest Classifier significantly. The research has shown the potential of ensemble ML techniques, especially in the diagnosis of medical conditions, including liver diseases which, if diagnosed early, are critical. The results add evidence regarding the applicability of the ML models in clinical practices with the potential to improve diagnostic activities and consequently the outcomes of patients. Future studies will build on these models, testing them on larger and more diverse sets of data, including aspects of deep learning, and apply the research to other disease domains. The work presented in this research offers a starting point for carrying out innovations with ML in the sphere of healthcare to progress the methods of diagnosing diseases and treatment.

Authors

  • W El Atifi
    Hassan First University of Settat, High Institute of Health Sciences, Laboratory of Sciences and Health Technologies, Settat, Morocco.
  • O El Rhazouani
    Hassan First University of Settat, High Institute of Health Sciences, Laboratory of Sciences and Health Technologies, Settat, Morocco.
  • Fida Muhammad Khan
    Department of Computer Science, Qurtuba University of Science and Information Technology, Peshawar, Pakistan.
  • H Sekkat
    Hassan First University of Settat, High Institute of Health Sciences, Laboratory of Sciences and Health Technologies, Settat, Morocco.