LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
Journal:
arXiv
Published Date:
Mar 20, 2025
Abstract
The success of modern machine learning, particularly in facial translation
networks, is highly dependent on the availability of high-quality, paired,
large-scale datasets. However, acquiring sufficient data is often challenging
and costly. Inspired by the recent success of diffusion models in high-quality
image synthesis and advancements in Large Language Models (LLMs), we propose a
novel framework called LLM-assisted Paired Image Generation (LaPIG). This
framework enables the construction of comprehensive, high-quality paired
visible and thermal images using captions generated by LLMs. Our method
encompasses three parts: visible image synthesis with ArcFace embedding,
thermal image translation using Latent Diffusion Models (LDMs), and caption
generation with LLMs. Our approach not only generates multi-view paired visible
and thermal images to increase data diversity but also produces high-quality
paired data while maintaining their identity information. We evaluate our
method on public datasets by comparing it with existing methods, demonstrating
the superiority of LaPIG.