A novel multi-modal retrieval framework for tracking vehicles using natural language descriptions.

Journal: PloS one

Published Date: Aug 11, 2025

Abstract

Recent advances in multimodal and contrastive learning have significantly enhanced image and video retrieval capabilities. This fusion provides numerous opportunities for multi-dimensional and multi-view retrieval, especially in multi-camera surveillance scenarios in traffic environments. This paper introduces a novel Multi-modal Vehicle Retrieval (MVR) system designed to retrieve the trajectories of tracked vehicles using natural language descriptions. The MVR system integrates an end-to-end text-video comparison learning model, utilizes CLIP for feature extraction, and uses a matching control system and multi-context-based attributes. Post-processing techniques are used to eliminate erroneous information. By comprehensively understanding vehicle characteristics, the MVR system can effectively identify trajectories based on natural language descriptions. Our method achieves a mean reciprocal ranking (MRR) score of 0.8966 on the test data set of the 7th AI City Challenge Track 2 for retrieving tracked vehicles through natural language descriptions, surpassing the previous top-ranked result on the public leaderboard.

Authors

Changhao Zhang

College of Computer Science and Technology, Xinjiang Normal University, Urumqi, China.
Zhandong Liu

Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA. zhandong.liu@bcm.edu.
Ke Li

School of Ideological and Political Education, Shanghai Maritime University, Shanghai, China.
Yong Li

Department of Surgical Sciences, Western Michigan University Homer Stryker M.D. School of Medicine, Kalamazoo, MI, United States.
Xiangwei Qi

College of Computer Science and Technology, Xinjiang Normal University, Urumqi, China.
Nan Ding

Reproductive Medicine Center, Lanzhou University Second Hospital, No.82, Cuiying Road, Chengguan District, Lanzhou City, Gansu Province, China.

Keywords

Algorithms Humans Motor Vehicles Natural Language Processing Video Recording

External Resources

View on PubMed Access via DOI PubMed (40788882)

A novel multi-modal retrieval framework for tracking vehicles using natural language descriptions.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A novel multi-modal retrieval framework for tracking vehicles using natural language descriptions.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals