Improving speech enhancement by focusing on smaller values using relative loss
文献类型:期刊论文
作者 | Li, Hongfeng1,2; Xu, Yanyan1,2; Ke, Dengfeng3![]() |
刊名 | IET SIGNAL PROCESSING
![]() |
出版日期 | 2020-08-01 |
卷号 | 14期号:6页码:374-384 |
关键词 | speech enhancement speech intelligibility performance evaluation learning (artificial intelligence) neural nets absolute differences speech quality relative loss single-channel speech enhancement noisy speech ideal ratio mask phase-sensitive mask mean square error loss function absolute error values magnitude spectra deep learning clean speech recovery short-time objective intelligibility signal-to-distortion ratio segmental signal-to-noise ratio performance evaluation |
ISSN号 | 1751-9675 |
DOI | 10.1049/iet-spr.2019.0290 |
通讯作者 | Xu, Yanyan(xuyanyan@bjfu.edu.cn) |
英文摘要 | The task of single-channel speech enhancement is to restore clean speech from noisy speech. Recently, speech enhancement has been greatly improved with the introduction of deep learning. Previous work proved that using ideal ratio mask or phase-sensitive mask as intermediation to recover clean speech can yield better performance. In this case, the mean square error is usually selected as the loss function. However, after conducting experiments, the authors find that the mean square error has a problem. It considers absolute error values, meaning that the gradients of the network depend on absolute differences between estimated values and true values, so the points in magnitude spectra with smaller values contribute little to the gradients. To solve this problem, they propose relative loss, which pays more attention to relative differences between magnitude spectra, rather than the absolute differences, and is more in accordance with human sensory characteristics. The perceptual evaluation of speech quality, the short-time objective intelligibility, the signal-to-distortion ratio, and the segmental signal-to-noise ratio are used to evaluate the performance of the relative loss. Experimental results show that it can greatly improve speech enhancement by focusing on smaller values. |
WOS关键词 | DEEP NEURAL-NETWORK ; SEPARATION |
资助项目 | World-Class Discipline Construction and Characteristic Development Guidance Funds for Beijing Forestry University[2019XKJS0310] |
WOS研究方向 | Engineering |
语种 | 英语 |
WOS记录号 | WOS:000555924300006 |
出版者 | INST ENGINEERING TECHNOLOGY-IET |
资助机构 | World-Class Discipline Construction and Characteristic Development Guidance Funds for Beijing Forestry University |
源URL | [http://ir.ia.ac.cn/handle/173211/40366] ![]() |
专题 | 模式识别国家重点实验室_智能交互 |
通讯作者 | Xu, Yanyan |
作者单位 | 1.Natl Forestry & Grassland Adm, Engn Res Ctr Forestry Oriented Intelligent Inform, 35 Qing Hua East Rd, Beijing 100083, Peoples R China 2.Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qing Hua East Rd, Beijing 100083, Peoples R China 3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhong Guan Cun East Rd, Beijing 100190, Peoples R China 4.Griffith Univ, Inst Integrated & Intelligent Syst, 170 Kessels Rd, Nathan, Qld 4111, Australia |
推荐引用方式 GB/T 7714 | Li, Hongfeng,Xu, Yanyan,Ke, Dengfeng,et al. Improving speech enhancement by focusing on smaller values using relative loss[J]. IET SIGNAL PROCESSING,2020,14(6):374-384. |
APA | Li, Hongfeng,Xu, Yanyan,Ke, Dengfeng,&Su, Kaile.(2020).Improving speech enhancement by focusing on smaller values using relative loss.IET SIGNAL PROCESSING,14(6),374-384. |
MLA | Li, Hongfeng,et al."Improving speech enhancement by focusing on smaller values using relative loss".IET SIGNAL PROCESSING 14.6(2020):374-384. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。