Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000
Journal:
arXiv
Published Date:
Jul 9, 2025
Abstract
When applying machine learning to medical image classification, data leakage
is a critical issue. Previous methods, such as adding noise to gradients for
differential privacy, work well on large datasets like MNIST and CIFAR-100, but
fail on small, imbalanced medical datasets like HAM10000. This is because the
imbalanced distribution causes gradients from minority classes to be clipped
and lose crucial information, while majority classes dominate. This leads the
model to fall into suboptimal solutions early. To address this, we propose
SAD-DPSGD, which uses a linear decaying mechanism for noise and clipping
thresholds. By allocating more privacy budget and using higher clipping
thresholds in the initial training phases, the model avoids suboptimal
solutions and enhances performance. Experiments show that SAD-DPSGD outperforms
Auto-DPSGD on HAM10000, improving accuracy by 2.15% under $\epsilon = 3.0$ ,
$\delta = 10^{-3}$.