|
作者 | Ye Bai ; Jiangyan Yi ; Jianhua Tao ; Zhengkun Tian ; Zhengqi Wen ; Shuai Zhang
|
出版日期 | 2020
|
会议日期 | 2020
|
会议地点 | shanghai
|
英文摘要 | Although attention based end-to-end models have achieved
promising performance in speech recognition, the multi-pass
forward computation in beam-search increases inference time
cost, whichlimitstheirpracticalapplications. Toaddressthisis-
sue, we propose a non-autoregressive end-to-end speech recog-
nition system called LASO (listen attentively, and spell once).
Because of the non-autoregressive property, LASO predicts a
textual token in the sequence without the dependence on other
tokens. Without beam-search, the one-pass propagation much
reduces inference time cost of LASO. And because the model
is based on the attention based feedforward structure, the com-
putation can be implemented in parallel efficiently. We conduct
experiments on publicly available Chinese dataset AISHELL-
1. LASO achieves a character error rate of 6.4%, which out-
performs the state-of-the-art autoregressive transformer model
(6.7%). The average inference latency is 21 ms, which is 1/50
of the autoregressive transformer model. |
源URL | [http://ir.ia.ac.cn/handle/173211/44978]  |
专题 | 模式识别国家重点实验室_智能交互
|
作者单位 | Institute of Automation, Chinese Academy of Sciences
|
推荐引用方式 GB/T 7714 |
Ye Bai,Jiangyan Yi,Jianhua Tao,et al. Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition[C]. 见:. shanghai. 2020.
|