Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images
Journal:
arXiv
Published Date:
May 28, 2025
Abstract
Machine unlearning aims to remove the influence of specific training samples
from a trained model without full retraining. While prior work has largely
focused on privacy-motivated settings, we recast unlearning as a
general-purpose tool for post-deployment model revision. Specifically, we focus
on utilizing unlearning in clinical contexts where data shifts, device
deprecation, and policy changes are common. To this end, we propose a bilevel
optimization formulation of boundary-based unlearning that can be solved using
iterative algorithms. We provide convergence guarantees when first-order
algorithms are used to unlearn. Our method introduces tunable loss design for
controlling the forgetting-retention tradeoff and supports novel model
composition strategies that merge the strengths of distinct unlearning runs.
Across benchmark and real-world clinical imaging datasets, our approach
outperforms baselines on both forgetting and retention metrics, including
scenarios involving imaging devices and anatomical outliers. This work
establishes machine unlearning as a modular, practical alternative to
retraining for real-world model maintenance in clinical applications.