中国科学院机构知识库网格系统: SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution

SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution

文献类型：期刊论文


作者	Wang, Fangyuan; Xu, Bo; Xu, Bo
刊名	IEEE SIGNAL PROCESSING LETTERS
出版日期	2024
卷号	31 页码:421-425
关键词	Convolution Complexity theory Computational modeling Decoding Training Kernel Transformers Conformer streaming ASR sequentially sampled chunks chunked causal convolution linear complexity
ISSN号	1070-9908
DOI	10.1109/LSP.2024.3352489
通讯作者	Xu, Bo(boxu@ia.ac.cn)
英文摘要	Currently, the chunk-wise schemes are often used to make Automatic Speech Recognition (ASR) models to support streaming deployment. However, existing approaches are unable to capture the global context, lack support for parallel training, or exhibit quadratic complexity for the computation of multi-head self-attention (MHSA). On the other side, the causal convolution, no future context used, has become the de facto module in streaming Conformer. In this letter, we propose SSCFormer to push the limit of chunk-wise Conformer for streaming ASR using the following two techniques: 1) A novel cross-chunks context generation method, named Sequential Sampling Chunk (SSC) scheme, to re-partition chunks from regular partitioned chunks to facilitate efficient long-term contextual interaction within local chunks. 2)The Chunked Causal Convolution (C2Conv) is designed to concurrently capture the left context and chunk-wise future context. Evaluations on AISHELL-1 show that an End-to-End (E2E) CER 5.33% can achieve, which even outperforms a strong time-restricted baseline U2. Moreover, the chunk-wise MHSA computation in our model enables it to train with a large batch size and perform inference with linear complexity.
资助项目	Strategic Priority Research Program of the Chinese Academy of Sciences
WOS研究方向	Engineering
语种	英语
WOS记录号	WOS:001166718500005
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	Strategic Priority Research Program of the Chinese Academy of Sciences
源URL	[http://ir.ia.ac.cn/handle/173211/57780]
专题	数字内容技术与服务研究中心_听觉模型与认知计算
通讯作者	Xu, Bo
作者单位	Chinese Acad Sci, Inst Automat, Beijing 10090, Peoples R China
推荐引用方式 GB/T 7714	Wang, Fangyuan,Xu, Bo,Xu, Bo. SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution[J]. IEEE SIGNAL PROCESSING LETTERS,2024,31:421-425.
APA	Wang, Fangyuan,Xu, Bo,&Xu, Bo.(2024).SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution.IEEE SIGNAL PROCESSING LETTERS,31,421-425.
MLA	Wang, Fangyuan,et al."SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution".IEEE SIGNAL PROCESSING LETTERS 31(2024):421-425.

入库方式： OAI收割

来源：自动化研究所

下载0

SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution

其他版本