CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Existing metrics often lack the granularity and interpretability to capture
nuanced clinical differences between candidate and ground-truth radiology
reports, resulting in suboptimal evaluation. We introduce a Clinically-grounded
tabular framework with Expert-curated labels and Attribute-level comparison for
Radiology report evaluation (CLEAR). CLEAR not only examines whether a report
can accurately identify the presence or absence of medical conditions, but also
assesses whether it can precisely describe each positively identified condition
across five key attributes: first occurrence, change, severity, descriptive
location, and recommendation. Compared to prior works, CLEAR's
multi-dimensional, attribute-level outputs enable a more comprehensive and
clinically interpretable evaluation of report quality. Additionally, to measure
the clinical alignment of CLEAR, we collaborate with five board-certified
radiologists to develop CLEAR-Bench, a dataset of 100 chest X-ray reports from
MIMIC-CXR, annotated across 6 curated attributes and 13 CheXpert conditions.
Our experiments show that CLEAR achieves high accuracy in extracting clinical
attributes and provides automated metrics that are strongly aligned with
clinical judgment.