Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
文献类型:期刊论文
作者 | Zhengyan Zhang2,3,4 |
刊名 | Machine Intelligence Research |
出版日期 | 2023 |
卷号 | 20期号:2页码:180-193 |
ISSN号 | 2731-538X |
关键词 | Pre-trained language models backdoor attacks transformers natural language processing (NLP) computer vision (CV) |
DOI | 10.1007/s11633-022-1377-5 |
英文摘要 | The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded in stances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons. |
源URL | [http://ir.ia.ac.cn/handle/173211/51474] |
专题 | 自动化研究所_学术期刊_International Journal of Automation and Computing |
作者单位 | 1.Huawei Noah′s Ark Laboratory, Hong Kong 999077, China 2.Beijing National Research Center for Information Science and Technology, Beijing 100084, China 3.Institute for Artificial Intelligence, Tsinghua University, Beijing 100084, China 4.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China |
推荐引用方式 GB/T 7714 | Zhengyan Zhang. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks[J]. Machine Intelligence Research,2023,20(2):180-193. |
APA | Zhengyan Zhang.(2023).Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks.Machine Intelligence Research,20(2),180-193. |
MLA | Zhengyan Zhang."Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks".Machine Intelligence Research 20.2(2023):180-193. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。