A hybrid approach for modeling bicycle crash frequencies: Integrating random forest based SHAP model with random parameter negative binomial regression model.

Journal: Accident; analysis and prevention
PMID:

Abstract

To effectively capture and explain complex, nonlinear relationships within bicycle crash frequency data and account for unobserved heterogeneity simultaneously, this study proposes a new hybrid framework that combines the Random Forest-based SHapley Additive exPlanations (RF-SHAP) method with a random parameter negative binomial regression model (RPNB). First, four machine learning algorithms, including random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and Extreme Gradient Boosting (XGBoost), were compared for variable importance calculation. The RF algorithm, demonstrating the best performance, was selected and integrated into an interpretable machine learning-based method (i.e., RF-SHAP) to provide an interpretable measure of each variable's impact, which is critical for understanding the model's predictions results. Finally, the RF-SHAP method was combined with the RPNB model to explore individual-specific variations that influence crash frequency predictions. Using 288 traffic analysis zones (TAZs) in Greater London and various regional risk factors for bicycle crash frequency, the proposed framework was validated. The results indicate that the proposed framework demonstrates improved prediction accuracy and better factor interpretation in analyzing bicycle crash frequency. The model exhibits consistent Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, indicating its reliable explanatory power. Furthermore, there is a significant improvement in the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). This suggests that the proposed model effectively combines the explanatory power of statistical models with the forecasting powers of data-driven models. The interpretability of SHAP values, coupled with the causal insights from RPNB, provides policymakers with actionable information to develop targeted interventions.

Authors

  • Hongliang Ding
    Institute of Smart City and Intelligent Transportation, Southwest Jiaotong University, Chengdu 611756, Sichuan, China. Electronic address: hongliang.ding@swjtu.edu.cn.
  • Ruiqi Wang
    Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA.
  • Tiantian Chen
    Cho Chun Shik Graduate School of Mobility, Korea Advanced Institute of Science and Technology, South Korea. Electronic address: nicole.chen@kaist.ac.kr.
  • N N Sze
    Department of Civil and Environmental Engineering, The Hong Kong Polytechnic, University, Hung Hom, Hong Kong. Electronic address: tony.nn.sze@polyu.edu.hk.
  • Hyungchul Chung
    Urban Planning and Design, Xi'an Jiaotong-Liverpool University, 111 Ren'ai Road, Suzhou Industrial Park, Suzhou, China. Electronic address: hyungchul.chung@xjtlu.edu.cn.
  • Ni Dong
    Urban Transport Research Center, School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan, 410075 PR China. Electronic address: dongni722@foxmail.com.