Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition
文献类型:期刊论文
作者 | Fan, Cunhang1![]() ![]() ![]() |
刊名 | APPLIED ACOUSTICS
![]() |
出版日期 | 2023-09-01 |
卷号 | 212页码:10 |
关键词 | Robust end-to-end ASR Speech enhancement Masking and mapping Speech distortion Deep spectrum fusion |
ISSN号 | 0003-682X |
DOI | 10.1016/j.apacoust.2023.109547 |
通讯作者 | Lv, Zhao(kjlz@ahu.edu.cn) |
英文摘要 | Recently, speech enhancement (SE) methods have achieved quite good performances. However, because of the speech distortion problem, the enhanced speech may lose significant information, which degrades the performance of automatic speech recognition (ASR). To address this problem, this paper proposes a two-stage deep spectrum fusion with the joint training framework for noise-robust end-to-end (E2E) ASR. It consists of a masking and mapping fusion (MMF) and a gated recurrent fusion (GRF). The MMF is used as the first stage and focuses on SE, which explores the complementarity of the enhancement methods of masking-based and mapping based to alleviate the problem of speech distortion. The GRF is used as the second stage and aims to further retrieve the lost information by fusing the enhanced speech of MMF and the original input. We conduct extensive experiments on an open Mandarin speech corpus AISHELL-1 with two noise datasets named 100 Nonspeech and NOISEX-92. Experimental results indicate that our proposed method significantly improves the performance and the character error rate (CER) is relatively reduced by 17.36% compared with the conventional joint training method. |
WOS关键词 | ENHANCEMENT ; NETWORKS ; DEREVERBERATION |
资助项目 | STI 2030-Major Projects[2021ZD0201500] ; National Natural Science Foundation of China (NSFC)[61972437] ; National Natural Science Foundation of China (NSFC)[62201002] ; Excellent Youth Founda-tion of Anhui Scientific Committee[208085J05] ; Special Fund for Key Program of Science and Technology of Anhui Province[202203a07020008] ; Open Fund of Key Laboratory of Flight Techniques and Flight Safety, CACC[FZ2022KF15] ; Open Research Projects of Zhejiang Lab[2021KH0AB06] ; Open Projects Program of National Laboratory of Pattern Recognition[202200014] |
WOS研究方向 | Acoustics |
语种 | 英语 |
WOS记录号 | WOS:001069151700001 |
出版者 | ELSEVIER SCI LTD |
资助机构 | STI 2030-Major Projects ; National Natural Science Foundation of China (NSFC) ; Excellent Youth Founda-tion of Anhui Scientific Committee ; Special Fund for Key Program of Science and Technology of Anhui Province ; Open Fund of Key Laboratory of Flight Techniques and Flight Safety, CACC ; Open Research Projects of Zhejiang Lab ; Open Projects Program of National Laboratory of Pattern Recognition |
源URL | [http://ir.ia.ac.cn/handle/173211/53113] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Lv, Zhao |
作者单位 | 1.Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei, Peoples R China 2.Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China 3.Univ Chinese Acad Sci, Ningbo Inst Life & Hlth Ind, Ningbo, Peoples R China |
推荐引用方式 GB/T 7714 | Fan, Cunhang,Ding, Mingming,Yi, Jiangyan,et al. Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition[J]. APPLIED ACOUSTICS,2023,212:10. |
APA | Fan, Cunhang,Ding, Mingming,Yi, Jiangyan,Li, Jinpeng,&Lv, Zhao.(2023).Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition.APPLIED ACOUSTICS,212,10. |
MLA | Fan, Cunhang,et al."Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition".APPLIED ACOUSTICS 212(2023):10. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。