LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs
Journal:
arXiv
Published Date:
May 6, 2025
Abstract
The growing demand for intelligent logistics, particularly fine-grained
terminal delivery, underscores the need for autonomous UAV (Unmanned Aerial
Vehicle)-based delivery systems. However, most existing last-mile delivery
studies rely on ground robots, while current UAV-based Vision-Language
Navigation (VLN) tasks primarily focus on coarse-grained, long-range goals,
making them unsuitable for precise terminal delivery. To bridge this gap, we
propose LogisticsVLN, a scalable aerial delivery system built on multimodal
large language models (MLLMs) for autonomous terminal delivery. LogisticsVLN
integrates lightweight Large Language Models (LLMs) and Visual-Language Models
(VLMs) in a modular pipeline for request understanding, floor localization,
object detection, and action-decision making. To support research and
evaluation in this new setting, we construct the Vision-Language Delivery (VLD)
dataset within the CARLA simulator. Experimental results on the VLD dataset
showcase the feasibility of the LogisticsVLN system. In addition, we conduct
subtask-level evaluations of each module of our system, offering valuable
insights for improving the robustness and real-world deployment of foundation
model-based vision-language delivery systems.