中国科学院机构知识库网格系统: Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

文献类型：会议论文


作者	Ye Bai; Jiangyan Yi; Jianhua Tao; Zhengkun Tian; Zhengqi Wen; Shuai Zhang
出版日期	2020
会议日期	2020
会议地点	shanghai
英文摘要	Although attention based end-to-end models have achieved promising performance in speech recognition, the multi-pass forward computation in beam-search increases inference time cost, whichlimitstheirpracticalapplications. Toaddressthisis- sue, we propose a non-autoregressive end-to-end speech recog- nition system called LASO (listen attentively, and spell once). Because of the non-autoregressive property, LASO predicts a textual token in the sequence without the dependence on other tokens. Without beam-search, the one-pass propagation much reduces inference time cost of LASO. And because the model is based on the attention based feedforward structure, the com- putation can be implemented in parallel efficiently. We conduct experiments on publicly available Chinese dataset AISHELL- 1. LASO achieves a character error rate of 6.4%, which out- performs the state-of-the-art autoregressive transformer model (6.7%). The average inference latency is 21 ms, which is 1/50 of the autoregressive transformer model.
源URL	[http://ir.ia.ac.cn/handle/173211/44978]
专题	模式识别国家重点实验室_智能交互
作者单位	Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Ye Bai,Jiangyan Yi,Jianhua Tao,et al. Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition[C]. 见:. shanghai. 2020.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。