UFM: Unified Feature Matching Pre-training with Multi-Modal Image Assistants
Journal:
arXiv
Published Date:
Mar 26, 2025
Abstract
Image feature matching, a foundational task in computer vision, remains
challenging for multimodal image applications, often necessitating intricate
training on specific datasets. In this paper, we introduce a Unified Feature
Matching pre-trained model (UFM) designed to address feature matching
challenges across a wide spectrum of modal images. We present Multimodal Image
Assistant (MIA) transformers, finely tunable structures adept at handling
diverse feature matching problems. UFM exhibits versatility in addressing both
feature matching tasks within the same modal and those across different modals.
Additionally, we propose a data augmentation algorithm and a staged
pre-training strategy to effectively tackle challenges arising from sparse data
in specific modals and imbalanced modal datasets. Experimental results
demonstrate that UFM excels in generalization and performance across various
feature matching tasks. The code will be released
at:https://github.com/LiaoYun0x0/UFM.