Fair and explainable Myocardial Infarction (MI) prediction: Novel strategies for feature selection and class imbalance correction.

Journal: Computers in biology and medicine
PMID:

Abstract

The rising incidences of myocardial infarction (MI), often affecting individuals without traditional risk factors, highlight the urgent need for improved early detection using personal health data. However, health surveys and electronic health records (EHRs) frequently suffer from class imbalances, leading to prediction biases and differences between specificity and sensitivity, which hinder reliable model development despite the valuable insights contained in these datasets. To address this, we have introduced a novel approach to enhance MI risk prediction using self-reported attributes from the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health Interview Survey (NHIS) dataset. Our approach incorporates three innovative techniques: the Dual-Path Artificial Neural Network (DP-ANN) to mitigate biased decision making across imbalanced datasets, the Triple Criteria Selection (TCS) for unbiased feature selection, and Minority Weighted Sampling (MWS) to tackle challenges of uncontrolled minority class sampling. These methods collectively enhance MI prediction and feature relevance. The DP-ANN model has achieved balanced performance, with an average specificity of 80%, sensitivity of 82%, and AUC-ROC of 89.5%, improving imbalance variance by approximately 14.96% compared to prior studies. By outperforming other models across four heavily imbalanced datasets, our approach demonstrates robustness and generalizability. Additionally, SHapley Additive exPlanations (SHAP) analysis has revealed key predictors and risk factors for MI, such as coronary heart disease and bronchitis in females, and stroke among individuals aged 35-54. In conclusion, our study provides a robust model for healthcare professionals to assess MI risk through targeted factors, promoting early detection and potentially improving patient outcomes.

Authors

  • Simon Bin Akter
    Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, 07102, NJ, USA.
  • Sumya Akter
    Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, NJ 07102, USA.
  • Moon Das Tuli
    Greenlife Medical College & Hospital, Dhaka, Bangladesh.
  • David Eisenberg
    Department of Information Systems, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, NJ 07102, USA.
  • Aaron Lotvola
    Department of Oncology, Wayne State University, School of Medicine, Detroit, MI, USA.
  • Humayera Islam
    Institute for Data Science and Informatics.
  • Jorge Fresneda Fernandez
    Martin Tuchman School of Management, New Jersey Institute of Technology, Newark, 07102, NJ, USA.
  • Maik Hüttemann
    Department of Biochemistry, Microbiology and Immunology, Wayne State University, School of Medicine, Detroit, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, School of Medicine, Detroit, MI, USA.
  • Tanmoy Sarkar Pias
    Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.