中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Attention Analysis and Calibration for Transformer in Natural Language Generation

文献类型:期刊论文

作者Yu, Lu1,2; Jiajun, Zhang1,2; Jiali, Zeng3; Shuangzhi, Wu3; Chengqing, Zong1,2
刊名IEEE/ACM Transactions on Audio, Speech, and Language Processing
出版日期2022-05
页码1927-1938
关键词神经机器翻译
DOI10.1109/taslp.2022.3180678
英文摘要

Attention mechanism has been ubiquitous in neural machine translation by dynamically selecting relevant contexts for different translations. Apart from performance gains, attention weights assigned to input tokens are often utilized to explain that high-attention tokens contribute more to the prediction. However, many works question whether this assumption holds in text classification by manually manipulating attention weights and observing decision flips. This article extends this question to Transformer-based neural machine translation, which heavily relies on cross-lingual attention to produce accurate translations but is relatively understudied in this context. We first design a mask perturbation model which automatically assesses each input’s contribution to model outputs. We then test whether the token contributing most to the current translation receives the highest attention weight. We find that it sometimes does not, which closely depends on the entropy of attention weights, the syntactic role of the current generation, and language pairs. We also rethink the discrepancy between attention weights and word alignments from the view of unreliable attention weights. Our observations further motivate us to calibrate the cross-lingual multi-head attention by attaching more attention to indispensable tokens, whose removal leads to a dramatic performance drop. Empirical experiments on different-scale translation tasks and text summarization tasks demonstrate that our calibration methods significantly outperform strong baselines.

URL标识查看原文
语种英语
源URL[http://ir.ia.ac.cn/handle/173211/51846]  
专题模式识别国家重点实验室_自然语言处理
通讯作者Jiajun, Zhang
作者单位1.School of Artificial Intelligence, University of Chinese Academy of Sciences
2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
3.Tencent Cloud Xiaowei
推荐引用方式
GB/T 7714
Yu, Lu,Jiajun, Zhang,Jiali, Zeng,et al. Attention Analysis and Calibration for Transformer in Natural Language Generation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing,2022:1927-1938.
APA Yu, Lu,Jiajun, Zhang,Jiali, Zeng,Shuangzhi, Wu,&Chengqing, Zong.(2022).Attention Analysis and Calibration for Transformer in Natural Language Generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing,1927-1938.
MLA Yu, Lu,et al."Attention Analysis and Calibration for Transformer in Natural Language Generation".IEEE/ACM Transactions on Audio, Speech, and Language Processing (2022):1927-1938.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。