Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach
Journal:
arXiv
Published Date:
Jan 31, 2025
Abstract
Machine unlearning seeks to remove the influence of specified data from a
trained model. While metrics such as unlearning accuracy (UA) and membership
inference attack (MIA) provide baselines for assessing unlearning performance,
they fall short of evaluating the forgetting reliability. In this paper, we
find that the data misclassified across UA and MIA still have their ground
truth labels included in the prediction set from the uncertainty quantification
perspective, which raises a fake unlearning issue. To address this issue, we
propose two novel metrics inspired by conformal prediction that more reliably
evaluate forgetting quality. Building on these insights, we further propose a
conformal prediction-based unlearning framework that integrates conformal
prediction into Carlini & Wagner adversarial attack loss, which can
significantly push the ground truth label out of the conformal prediction set.
Through extensive experiments on image classification task, we demonstrate both
the effectiveness of our proposed metrics and the superiority of our unlearning
framework, which improves the UA of existing unlearning methods by an average
of 6.6% through the incorporation of a tailored loss term alone.