Beyond the norm: Identifying rare and high-risk pedestrian crash patterns using unsupervised learning.
Journal:
Accident; analysis and prevention
Published Date:
Jan 17, 2026
Abstract
Pedestrian safety remains a major concern, with fatalities rising despite infrastructure and safety improvements. To make meaningful progress, efforts should focus more intensely on reducing the most dangerous and fatal cases, given the growing importance of conventional and automated vehicle safety in shaping crash outcomes. This study introduces a composite unsupervised edge case detection framework that combines Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction with Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Each crash receives a composite score based on its cluster membership uncertainty and its distance from the core of typical crash patterns in the UMAP space. Based on these scores, crashes are classified into three interpretive layers: Core, Moderate Edge, and Strong Edge. Core cases represent common patterns, while Strong Edge cases reflect rare and complex situations. The framework is applied to 10,108 police-reported crashes from North Carolina coded with the Pedestrian and Bicycle Crash Analysis Tool (PBCAT), a relatively clean database of pedestrian crashes. Crash severity and contextual characteristics were compared across the three layers. Strong Edge crashes were substantially more severe, with 36.6% resulting in fatal injuries compared to 8.1% in the Core group. These high-risk cases often occurred in rural areas, under poor lighting conditions, in non-intersection locations, and involved behaviors such as unusual circumstances or crossing expressways. The findings show that the built environment and crash type influence pedestrian crash patterns. The edge case framework helps detect rare, high-risk crashes often missed by traditional methods, supporting targeted safety efforts.
Authors
Keywords
No keywords available for this article.