A Multimodal Pipeline for Clinical Data Extraction: Applying Vision-Language Models to Scans of Transfusion Reaction Reports
Journal:
arXiv
Published Date:
Apr 28, 2025
Abstract
Despite the growing adoption of electronic health records, many processes
still rely on paper documents, reflecting the heterogeneous real-world
conditions in which healthcare is delivered. The manual transcription process
is time-consuming and prone to errors when transferring paper-based data to
digital formats. To streamline this workflow, this study presents an
open-source pipeline that extracts and categorizes checkbox data from scanned
documents. Demonstrated on transfusion reaction reports, the design supports
adaptation to other checkbox-rich document types. The proposed method
integrates checkbox detection, multilingual optical character recognition (OCR)
and multilingual vision-language models (VLMs). The pipeline achieves high
precision and recall compared against annually compiled gold-standards from
2017 to 2024. The result is a reduction in administrative workload and accurate
regulatory reporting. The open-source availability of this pipeline encourages
self-hosted parsing of checkbox forms.