Transcribing multilingual radiologist-patient dialogue into mammography reports using AI: A step towards patient-centric radiology

Journal: medRxiv
Published Date:

Abstract

Radiology reports are primarily designed for healthcare professionals, often containing complex medical terminology hindering patients from understanding their diagnostic results. This communication gap is especially pronounced in non-English-speaking regions. AI-driven transcription and report generation, leveraging automated speech recognition (ASR) and large language models (LLMs), could enable patient-centered, accessible reporting from radiologist-patient conversations in vernacular language. To evaluate the feasibility of AI-driven transcription and automated mammography report generation from simulated radiologist-patient conversations in vernacular language, assessing transcription accuracy, report concordance, error patterns, and time efficiency. A curated dataset of 50 mammograms was retrospectively selected from the Picture Archiving and Communication System (PACS) of our department. Simulated radiologist-patient conversations, conducted in vernacular Hindi, were recorded and transcribed using the OpenAI Whisper large-v2 ASR model. Four transcriptions per conversation were generated at different temperatures (0, 0.3, 0.5, 0.7) to maximize information capture. Structured mammography reports were generated from the transcriptions using GPT-4o, guided by detailed prompt instructions. Reports were reviewed and corrected by a radiologist, and AI performance was assessed through word error rate (WER), character error rate (CER), report concordance rates, error analysis, and time efficiency metrics. The lowest WER (0.577) and CER (0.379) were observed at temperature 0. The overall mean concordance rate between AI-generated and radiologist-edited reports was 0.94, with structured fields achieving higher concordance than descriptive fields. Errors were present in 50% of AI-generated reports, predominantly missed and incorrect information, with a higher error rate in malignant cases. The mean time for AI-driven report generation was 207.4 seconds, with radiologist editing contributing 43.1 seconds on average. AI-driven workflow integrating ASR and LLMs to generate structured mammography reports from radiologist-patient conversations in vernacular language, is feasible. While challenges such as privacy, validation, and scalability remain, this approach represents a significant step toward patient-centric and AI-integrated radiology practice.

Authors

  • Amit Gupta; Ashish Rastogi; Neha Rani; Mohak Narang; Krithika Rangarajan