NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models
Journal:
arXiv
Published Date:
Jul 5, 2025
Abstract
Birds' Eye View (BEV) semantic segmentation is an indispensable perception
task in end-to-end autonomous driving systems. Unsupervised and semi-supervised
learning for BEV tasks, as pivotal for real-world applications, underperform
due to the homogeneous distribution of the labeled data. In this work, we
explore the potential of synthetic data from driving world models to enhance
the diversity of labeled data for robustifying BEV segmentation. Yet, our
preliminary findings reveal that generation noise in synthetic data compromises
efficient BEV model learning. To fully harness the potential of synthetic data
from world models, this paper proposes NRSeg, a noise-resilient learning
framework for BEV semantic segmentation. Specifically, a Perspective-Geometry
Consistency Metric (PGCM) is proposed to quantitatively evaluate the guidance
capability of generated data for model learning. This metric originates from
the alignment measure between the perspective road mask of generated data and
the mask projected from the BEV labels. Moreover, a Bi-Distribution Parallel
Prediction (BiDPP) is designed to enhance the inherent robustness of the model,
where the learning process is constrained through parallel prediction of
multinomial and Dirichlet distributions. The former efficiently predicts
semantic probabilities, whereas the latter adopts evidential deep learning to
realize uncertainty quantification. Furthermore, a Hierarchical Local Semantic
Exclusion (HLSE) module is designed to address the non-mutual exclusivity
inherent in BEV semantic segmentation tasks. Experimental results demonstrate
that NRSeg achieves state-of-the-art performance, yielding the highest
improvements in mIoU of 13.8% and 11.4% in unsupervised and semi-supervised BEV
segmentation tasks, respectively. The source code will be made publicly
available at https://github.com/lynn-yu/NRSeg.