Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition
Journal:
arXiv
Published Date:
Apr 11, 2025
Abstract
Handwritten Text Recognition (HTR) is essential for document analysis and
digitization. However, handwritten data often contains user-identifiable
information, such as unique handwriting styles and personal lexicon choices,
which can compromise privacy and erode trust in AI services. Legislation like
the ``right to be forgotten'' underscores the necessity for methods that can
expunge sensitive information from trained models. Machine unlearning addresses
this by selectively removing specific data from models without necessitating
complete retraining. Yet, it frequently encounters a privacy-accuracy tradeoff,
where safeguarding privacy leads to diminished model performance. In this
paper, we introduce a novel two-stage unlearning strategy for a multi-head
transformer-based HTR model, integrating pruning and random labeling. Our
proposed method utilizes a writer classification head both as an indicator and
a trigger for unlearning, while maintaining the efficacy of the recognition
head. To our knowledge, this represents the first comprehensive exploration of
machine unlearning within HTR tasks. We further employ Membership Inference
Attacks (MIA) to evaluate the effectiveness of unlearning user-identifiable
information. Extensive experiments demonstrate that our approach effectively
preserves privacy while maintaining model accuracy, paving the way for new
research directions in the document analysis community. Our code will be
publicly available upon acceptance.