Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Journal: arXiv

Published Date: May 13, 2025

Abstract

Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveraging the standardized nature of US reports. By aligning modular text fragments with diverse imaging data and curating a bilingual English-Chinese dataset, the method achieves consistent and clinically accurate text generation across organ sites and languages. Fine-tuning with selective unfreezing of the vision transformer (ViT) further improves text-image alignment. Compared to the previous state-of-the-art KMVE method, our approach achieves relative gains of about 2\% in BLEU scores, approximately 3\% in ROUGE-L, and about 15\% in CIDEr, while significantly reducing errors such as missing or incorrect content. By unifying multi-organ and multi-language report generation into a single, scalable framework, this work demonstrates strong potential for real-world clinical workflows.

Authors

Peixuan Ge
Tongkun Su
Faqin Lv
Baoliang Zhao
Peng Zhang
Chi Hong Wong
Liang Yao
Yu Sun
Zenan Wang
Pak Kin Wong
Ying Hu

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.08838v2)

Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals