Multi-concept Model Immunization through Differentiable Model Merging
Journal:
arXiv
Published Date:
Dec 19, 2024
Abstract
Model immunization is an emerging direction that aims to mitigate the
potential risk of misuse associated with open-sourced models and advancing
adaptation methods. The idea is to make the released models' weights difficult
to fine-tune on certain harmful applications, hence the name ``immunized''.
Recent work on model immunization focuses on the single-concept setting.
However, models need to be immunized against multiple concepts in real-world
situations. To address this gap, we propose an immunization algorithm that,
simultaneously, learns a single ``difficult initialization'' for adaptation
methods over a set of concepts. We achieve this by incorporating a
differentiable merging layer that combines a set of model weights adapted over
multiple concepts. In our experiments, we demonstrate the effectiveness of
multi-concept immunization by generalizing prior work's experiment setup of
re-learning and personalization adaptation to multiple concepts.