Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging?
Journal:
arXiv
Published Date:
Jan 6, 2025
Abstract
With the dramatic upsurge in the volume of ultrasound examinations,
low-quality ultrasound imaging has gradually increased due to variations in
operator proficiency and imaging circumstances, imposing a severe burden on
diagnosis accuracy and even entailing the risk of restarting the diagnosis in
critical cases. To assist clinicians in selecting high-quality ultrasound
images and ensuring accurate diagnoses, we introduce Ultrasound-QBench, a
comprehensive benchmark that systematically evaluates multimodal large language
models (MLLMs) on quality assessment tasks of ultrasound images.
Ultrasound-QBench establishes two datasets collected from diverse sources:
IVUSQA, consisting of 7,709 images, and CardiacUltraQA, containing 3,863
images. These images encompassing common ultrasound imaging artifacts are
annotated by professional ultrasound experts and classified into three quality
levels: high, medium, and low. To better evaluate MLLMs, we decompose the
quality assessment task into three dimensionalities: qualitative
classification, quantitative scoring, and comparative assessment. The
evaluation of 7 open-source MLLMs as well as 1 proprietary MLLMs demonstrates
that MLLMs possess preliminary capabilities for low-level visual tasks in
ultrasound image quality classification. We hope this benchmark will inspire
the research community to delve deeper into uncovering and enhancing the
untapped potential of MLLMs for medical imaging tasks.