Adversarial Unlearning of Backdoors via Implicit Hypergradient

We propose a minimax formulation to remove backdoors from a poisoned model using only a small clean set. Our Implicit Backdoor Adversarial Unlearning (I-BAU) algorithm solves the minimax via implicit hypergradients, capturing inner–outer dependencies unlike prior methods. We prove convergence and generalization of robustness from clean-data minimax to unseen test data. Across seven attacks and two datasets, I-BAU matches or outperforms six state-of-the-art defenses, is robust to trigger/settings/poison ratios, needs less compute (notably >× faster in single-target attacks), and remains effective with only 100 clean samples.

Authors

Yi Zeng

Si Chen

Won Park

Zhuoqing Mao

Ming Jin

Ruoxi Jia

Published

January 1, 2022