Comparison of machine learning algorithms for predicting length of stay in chronic kidney disease patients.
Journal:
Computers in biology and medicine
Published Date:
Aug 4, 2025
Abstract
The length of stay (LOS) for patients in hospitals is crucial for workforce planning, resource allocation, and bed capacity management, impacting healthcare costs, future needs and financial planning. This study focuses on calculating the LOS for Chronic Kidney Disease (CKD) patients admitted as inpatients and estimating their hospital bills based on services rendered during their stay. Utilizing data from 5,583 CKD patients and 11 input variables, various machine learning (ML) algorithms were applied to develop regression, and classification models. To optimize the model performance and address potential overfitting issues, feature selection techniques were also employed. The Random Forest (RF) algorithm achieved the highest performance for bill amount estimation, with a Correlation Coefficient (CC) of 0.736. The algorithms predicting LOS showed even more promising results, with all performing above 0.848 on the CC metric. The best performances were obtained from Support Vector Machine (SVM), M5P trees and RF with Mean Absolute Error (MAE) and CC results of 2.580 day-0.875, 2.587 day-0.880 and 2.611 day-0.880, respectively. LOS was categorized as short or long using ML algorithms, with Logistic Regression (LogR) achieving the best classification results: 0.944 on the AUC-ROC (Area Under the ROC Curve) metric and 0.872 on the F-Measure metric. The RF algorithm also excelled in classification based on patient units, producing results of 0.788 on the AUC-ROC and 0.863 for accuracy. Additionally with feature selection revealed that reducing input variables maintained prediction accuracy for bill amount and LOS, but it generally negatively affected classification performance. Feature selection was identified as a critical challenge, particularly in balancing the trade-off between dimensionality reduction and predictive accuracy. While dimensionality reduction can improve computational efficiency, careful selection of input variables is essential to maintain robust classification performance. Given the lengthy treatment processes for CKD patients, accurate predictions of LOS, billing amounts, and admission units will assist health managers in planning for future resource needs, such as medical supplies and workforce. Ultimately, this study provides insights that can enhance the financial sustainability and management of healthcare services.