Automating Handwritten Vaccination Record Transcription with Generative Multimodal AI Models: A Proof of Concept Study from The Gambia

Journal: medRxiv
Published Date:

Abstract

Handwritten home-based vaccination records (HBRs) are a vital source of immunization data, yet manual transcription in household surveys is time-consuming, error-prone, and resource-intensive. Recent advances in multimodal generative AI models offer a potential pathway to partially automate this task under certain conditions, improving efficiency while maintaining data quality. We evaluated the performance of generative multimodal AI models, specifically OpenAI’s GPT-4o, in transcribing handwritten vaccination cards from a 2022 survey targeting children aged 12 - 35 months in The Gambia. Using a curated dataset of 335 cards (6,700 vaccination entries), we developed a gold-standard benchmark from three human transcribers (all of them in The Gambia) and assessed AI model performance across transcription accuracy, vaccination coverage estimates, timeliness, and missed opportunities for simultaneous vaccination (MOSV). We also tested a confidence-based segmentation approach to identify high-confidence transcriptions suitable for automation versus low-confidence entries requiring human review. The fine-tuned GPT-4o model achieved 79% accuracy for exact date transcription and reached human-level performance (94% accuracy) on 69% of entries classified by the AI as high-confidence. Coverage and timeliness estimates from high-confidence transcriptions were 98.0% and 91.6% accurate, respectively, compared to 98.8% and 95.6% from human transcribers. Date errors by AI and humans differ systematically, with AI showing fewer year-shift errors. Multimodal AI models show strong potential for automating HBR transcription in immunization coverage surveys, at least in a setting like The Gambia. When paired with confidence-based filtering, these models achieve human-level performance on coverage and timeliness estimates—the key metrics used in programmatic decision-making—across a large subset of records. This enables substantial gains in efficiency while preserving data quality. Further research should evaluate generalizability across diverse card formats, languages, and contexts to support integration into real-world immunization programs and health monitoring activities.

Authors

  • Roy Burstein; Alieu Sowe; M. Carolina Danovaro-Holliday; Mitsuki Koh; Joshua L. Proctor