中国科学院机构知识库网格系统: Zero-shot voice conversion based on feature disentanglement

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Zero-shot voice conversion based on feature disentanglement

文献类型：期刊论文


作者	Na Guo 3; Jianguo Wei 3; Yongwei Li 2; Wenhuan Lu 3; Jianhua Tao 1
刊名	Speech Communication
出版日期	2024
卷号	165
通讯作者邮箱	liyw@psych.ac.cn (y. li)
关键词	Zero-shot voice conversion Mixed speaker layer normalization Adaptive attention weight normalization Dynamic convolution
DOI	10.1016/j.specom.2024.103143
文献子类	综述
英文摘要	Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.
收录类别	EI
语种	英语
源URL	[http://ir.psych.ac.cn/handle/311026/48789]
专题	心理研究所_中国科学院行为科学重点实验室
作者单位	1.Department of Automation, Tsinghua University, Beijing, China 2.CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 3.College of Intelligence and Computing, Tianjin University, Tianjin, China
推荐引用方式 GB/T 7714	Na Guo,Jianguo Wei,Yongwei Li,et al. Zero-shot voice conversion based on feature disentanglement[J]. Speech Communication,2024,165.
APA	Na Guo,Jianguo Wei,Yongwei Li,Wenhuan Lu,&Jianhua Tao.(2024).Zero-shot voice conversion based on feature disentanglement.Speech Communication,165.
MLA	Na Guo,et al."Zero-shot voice conversion based on feature disentanglement".Speech Communication 165(2024).

入库方式： OAI收割

来源：心理研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。