A novel XGBoost method with entity embeddings for feature analysis and classification of traffic crash types.
Journal:
International journal of injury control and safety promotion
Published Date:
Jun 2, 2026
Abstract
Crash type is an important factor in understanding crash severity, as certain types lead to higher mortality rates. Predicting crash type for specific road sections can therefore support road safety assessments. This study examines the relationships between geometric road elements at crash sites and identifies key features in crash type classification. Crash data from a 10-year period in 10 central districts of İzmir, Türkiye, were analysed. Among these districts, the three with the highest number of crashes, namely Bornova, Karşıyaka and Konak, were used as geographically distinct test districts, while model training was performed using data from the remaining central districts within each temporal group. Feature importance ranking was conducted using the Extreme Gradient Boosting (XGBoost) method. As the main contribution, we propose Embedding-XGBoost (E-XGB), a novel two-stage dimensionality reduction approach that integrates entity embeddings with XGBoost to improve classification performance. E-XGB enables modelling crash data in a lower-dimensional feature space, allowing predictions with reduced computational effort and robustness against missing data. The superiority of E-XGB was demonstrated by comparing its performance with four machine learning algorithms: XGBoost, support vector machine, K-nearest neighbours and multilayer perceptron. Results show that E-XGB achieves classification performance values, in terms of accuracy, F1-score and precision up to 85.42%, 85.09% and 86.03%, respectively, when 10 features are used.
Authors
Keywords
No keywords available for this article.