Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions
Journal:
arXiv
Published Date:
Dec 24, 2024
Abstract
Extracting medication names from handwritten doctor prescriptions is
challenging due to the wide variability in handwriting styles and prescription
formats. This paper presents a robust method for extracting medicine names
using a combination of Mask R-CNN and Transformer-based Optical Character
Recognition (TrOCR) with Multi-Head Attention and Positional Embeddings. A
novel dataset, featuring diverse handwritten prescriptions from various regions
of Pakistan, was utilized to fine-tune the model on different handwriting
styles. The Mask R-CNN model segments the prescription images to focus on the
medicinal sections, while the TrOCR model, enhanced by Multi-Head Attention and
Positional Embeddings, transcribes the isolated text. The transcribed text is
then matched against a pre-existing database for accurate identification. The
proposed approach achieved a character error rate (CER) of 1.4% on standard
benchmarks, highlighting its potential as a reliable and efficient tool for
automating medicine name extraction.