中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Zero-shot voice conversion based on feature disentanglement

文献类型:期刊论文

作者Na Guo3; Jianguo Wei3; Yongwei Li2; Wenhuan Lu3; Jianhua Tao1
刊名Speech Communication
出版日期2024
卷号165
通讯作者邮箱liyw@psych.ac.cn (y. li)
关键词Zero-shot voice conversion Mixed speaker layer normalization Adaptive attention weight normalization Dynamic convolution
DOI10.1016/j.specom.2024.103143
文献子类综述
英文摘要

Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models. 

收录类别EI
语种英语
源URL[http://ir.psych.ac.cn/handle/311026/48789]  
专题心理研究所_中国科学院行为科学重点实验室
作者单位1.Department of Automation, Tsinghua University, Beijing, China
2.CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
3.College of Intelligence and Computing, Tianjin University, Tianjin, China
推荐引用方式
GB/T 7714
Na Guo,Jianguo Wei,Yongwei Li,et al. Zero-shot voice conversion based on feature disentanglement[J]. Speech Communication,2024,165.
APA Na Guo,Jianguo Wei,Yongwei Li,Wenhuan Lu,&Jianhua Tao.(2024).Zero-shot voice conversion based on feature disentanglement.Speech Communication,165.
MLA Na Guo,et al."Zero-shot voice conversion based on feature disentanglement".Speech Communication 165(2024).

入库方式: OAI收割

来源:心理研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。