Predicting car accident severity in Northwest Ethiopia: a machine learning approach leveraging driver, environmental, and road conditions.

Journal: Scientific reports
Published Date:

Abstract

Road traffic accidents (RTAs) in Northwest Ethiopia, a region with a fatality rate of 32.2 per 100,000 residents, pose a critical public health challenge exacerbated by infrastructural deficits and environmental hazards. This study leverages machine learning (ML) to predict accident severity, addressing gaps in localized predictive frameworks for low- and middle-income countries (LMICs). Our study aims to predict the severity of car accidents in Northwest Ethiopia via machine-learning techniques. Using a dataset of 2,000 accidents (2018-2023) from police reports, we integrated driver demographics, behavioral factors (e.g., alcohol use, seatbelt compliance), and environmental conditions (e.g., unpaved roads, weather) in North West Ethiopia. Ten ML models, including Random Forest, XGBoost, and LightGBM, were evaluated after addressing class imbalance via the Synthetic Minority Oversampling Technique (SMOTE). Hyperparameter tuning and Shapley Additive explanations (SHAP) provided model optimization and interpretability. Random Forest outperformed other models, achieving 82% accuracy (AUC-ROC: 0.87) post-tuning. Driver age (mean: 44 years) and environmental factors (e.g., nighttime on unlit roads, rainy conditions) were critical predictors, increasing fatal accident likelihood by 62%. SMOTE improved the accuracy of the outperforming random forest accuracy from 78.6 to 82%. Random Forest exhibited the highest recall (0.82) after optimization, while ensemble methods dominated performance metrics. The study underscores the efficacy of ML in contextualizing accident severity in LMICs, with Random Forest emerging as a robust tool for policymakers. Prioritizing road paving, sobriety checkpoints, and motorcycle safety could mitigate risks, aligning with Sustainable Development Goal 3.6. Future work should address data limitations (underreporting, geospatial gaps) and expand model interpretability.

Authors

  • Abraham Keffale Mengistu
    Department of Health Informatics, College of Medicine Health Science, Debre Markos University, Debre Markos, Ethiopia. abreham_keffale@dmu.edu.et.
  • Andualem Enyew Gedefaw
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Nebebe Demis Baykemagn
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Agmasie Damtew Walle
    Department of Health Informatics, College of Medicine and Health Science, Debre Berhan University, Debre Berhan, Ethiopia.
  • Tirualem Zeleke Yehuala
    Department Health informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia. sarazeleke3@gmail.com.
  • Meron Asmamaw Alemayehu
    Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Amhara, Ethiopia merryalem101@gmail.com.
  • Mengistu Abebe Messelu
    Department of Nursing, College of Medicine and Health Sciences, Debre Markos University, Debre Markos, Ethiopia.
  • Bayou Tilahun Assaye
    Department of Health Informatics, College of Medicine Health Science, Debre Markos University, Debre Markos, Ethiopia.