中国科学院机构知识库网格系统: Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

文献类型：期刊论文


作者	Zhao Yang1,2 ; Yuanzhe Zhang1,2 ; Dianbo Sui3 ; Yiming Ju1,2 ; Jun Zhao1,2 ; Kang Liu1,2
刊名	ACM Transactions on Asian and Low-Resource Language Information Processing
出版日期	2024
卷号	23 期号:2 页码:1-19
ISSN号	2375-4699
DOI	https://doi.org/10.1145/3639364
英文摘要	Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the student model are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.
WOS记录号	WOS:001193524700014
源URL	[http://ir.ia.ac.cn/handle/173211/56723]
专题	复杂系统认知与决策实验室
通讯作者	Kang Liu
作者单位	1.The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China 2.School of Artificial Intelligence, University of Chinese Academy of Sciences, China 3.Harbin Institute of Technology, Weihai, China
推荐引用方式 GB/T 7714	Zhao Yang,Yuanzhe Zhang,Dianbo Sui,et al. Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression[J]. ACM Transactions on Asian and Low-Resource Language Information Processing,2024,23(2):1-19.
APA	Zhao Yang,Yuanzhe Zhang,Dianbo Sui,Yiming Ju,Jun Zhao,&Kang Liu.(2024).Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression.ACM Transactions on Asian and Low-Resource Language Information Processing,23(2),1-19.
MLA	Zhao Yang,et al."Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression".ACM Transactions on Asian and Low-Resource Language Information Processing 23.2(2024):1-19.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。