SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability
Journal:
arXiv
Published Date:
Jun 17, 2025
Abstract
Accurate prediction of pedestrian trajectories is essential for applications
in robotics and surveillance systems. While existing approaches primarily focus
on social interactions between pedestrians, they often overlook the rich
environmental context that significantly shapes human movement patterns. In
this paper, we propose SceneAware, a novel framework that explicitly
incorporates scene understanding to enhance trajectory prediction accuracy. Our
method leverages a Vision Transformer~(ViT) scene encoder to process
environmental context from static scene images, while Multi-modal Large
Language Models~(MLLMs) generate binary walkability masks that distinguish
between accessible and restricted areas during training. We combine a
Transformer-based trajectory encoder with the ViT-based scene encoder,
capturing both temporal dynamics and spatial constraints. The framework
integrates collision penalty mechanisms that discourage predicted trajectories
from violating physical boundaries, ensuring physically plausible predictions.
SceneAware is implemented in both deterministic and stochastic variants.
Comprehensive experiments on the ETH/UCY benchmark datasets show that our
approach outperforms state-of-the-art methods, with more than 50\% improvement
over previous models. Our analysis based on different trajectory categories
shows that the model performs consistently well across various types of
pedestrian movement. This highlights the importance of using explicit scene
information and shows that our scene-aware approach is both effective and
reliable in generating accurate and physically plausible predictions. Code is
available at: https://github.com/juho127/SceneAware.