Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning.
Journal:
Journal of biomedical informatics
PMID:
39956346
Abstract
Heart failure (HF) remains a significant public health challenge with high mortality rates. Machine learning (ML) techniques offer a promising approach to predict HF mortality, potentially improving clinical outcomes. However, the effectiveness of these techniques heavily depends on the quality and relevance of the features used. This study introduces a novel hybrid feature selection methodology that combines Extremely Randomized Trees (Extra-Trees) and non-linear correlation measures to enhance 1-year all-cause mortality prediction in HF patients using echocardiographic and key demographic data. Unlike existing feature selection methods that are often tied to specific ML models and produce inconsistent feature sets across different algorithms, our proposed approach is model-independent, ensuring robustness and generalizability. Moreover, the optimal number of predictive features is identified through loss graph inspection, leading to a compact and highly informative subset of seven features. We trained and evaluated seven widely-used ML models on both the full feature set and the selected subset, finding that most models maintained or improved their predictive performance despite an 80% reduction in features. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP), allowing for a detailed examination of how individual features influence predictions. To further assess its effectiveness, we compared our methodology against widely known feature selection techniques across all seven ML models. The results underscore the superiority of our proposed feature set in accurately predicting HF mortality over conventional methods, offering new opportunities for personalized management strategies based on a streamlined and explainable feature subset.