Analyzing relationships between latent topics in autonomous vehicle crash narratives and crash severity using natural language processing techniques and explainable XGBoost.

Journal: Accident; analysis and prevention
PMID:

Abstract

Safety is one of the most essential considerations when evaluating the performance of autonomous vehicles (AVs). Real-world AV data, including trajectory, detection, and crash data, are becoming increasingly popular as they provide possibilities for a realistic evaluation of AVs' performance. While substantial research was conducted to estimate general crash patterns utilizing structured AV crash data, a comprehensive exploration of AV crash narratives remains limited. These narratives contain latent information about AV crashes that can further the understanding of AV safety. Therefore, this study utilizes the Structural Topic Model (STM), a natural language processing technique, to extract latent topics from unstructured AV crash narratives while incorporating crash metadata (i.e., the severity and year of crashes). In total, 15 topics are identified and are further divided into behavior-related, party-related, location-related, and general topics. Using these topics, AV crashes can be systematically described and clustered. Results from the STM suggest that AVs' abilities to interact with vulnerable road users (VRUs) and react to lane-change behavior need to be further improved. Moreover, an XGBoost model is developed to investigate the relationships between the topics and crash severity. The model significantly outperforms existing studies in terms of accuracy, suggesting that the extracted topics are closely related to crash severity. Results from interpreting the model indicate that topics containing information about crash severity and VRUs have significant impacts on the model's output, which are suggested to be included in future AV crash reporting.

Authors

  • Pei Li
    State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Guizhou University, Huaxi District, Guiyang 550025, China.
  • Sikai Chen
    Department of Ultrasound, Zhongnan Hospital of Wuhan University, Wuhan, China.
  • Lishengsa Yue
    The Key Laboratory of Road and Traffic Engineering, Ministry of Education Tongji University, Shanghai, China. Electronic address: 2014yuelishengsa@tongji.edu.cn.
  • Yuan Xu
    Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi'an, China.
  • David A Noyce
    Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI 53706, United States of America. Electronic address: danoyce@wisc.edu.