Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare
Journal:
arXiv
Published Date:
May 12, 2025
Abstract
The intricate and multifaceted nature of vision language model (VLM)
development, adaptation, and application necessitates the establishment of
clear and standardized reporting protocols, particularly within the high-stakes
context of healthcare. Defining these reporting standards is inherently
challenging due to the diverse nature of studies involving VLMs, which vary
significantly from the development of all new VLMs or finetuning for domain
alignment to off-the-shelf use of VLM for targeted diagnosis and prediction
tasks. In this position paper, we argue that traditional machine learning
reporting standards and evaluation guidelines must be restructured to
accommodate multiphase VLM studies; it also has to be organized for intuitive
understanding of developers while maintaining rigorous standards for
reproducibility. To facilitate community adoption, we propose a categorization
framework for VLM studies and outline corresponding reporting standards that
comprehensively address performance evaluation, data reporting protocols, and
recommendations for manuscript composition. These guidelines are organized
according to the proposed categorization scheme. Lastly, we present a checklist
that consolidates reporting standards, offering a standardized tool to ensure
consistency and quality in the publication of VLM-related research.