中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Emotion selectable end-to-end text-based speech editing

文献类型:期刊论文

作者Wang, Tao1,2; Yi, Jiangyan1; Fu, Ruibo1; Tao, Jianhua3; Wen, Zhengqi2; Zhang, Chu Yuan1,2
刊名ARTIFICIAL INTELLIGENCE
出版日期2024-04-01
卷号329页码:16
关键词Emotion selectable Text-based speech editing Emotion decoupling Mask prediction Few-shot learning Text-to-speech
ISSN号0004-3702
DOI10.1016/j.artint.2024.104076
通讯作者Yi, Jiangyan() ; Fu, Ruibo() ; Tao, Jianhua(jhtao@tsinghua.edu.cn)
英文摘要Text-based speech editing is a convenient way for users to edit speech by intuitively cutting, copying, and pasting text. Previous work introduced CampNet, a context-aware mask prediction network that significantly improved the quality of edited speech. However, this paper proposes a new task: adding emotional effects to the edited speech during text-based speech editing to enhance the expressiveness and controllability of the edited speech. To achieve this, we introduce Emo-CampNet, which allows users to select emotional attributes for the generated speech and has the ability to edit the speech of unseen speakers. Firstly, the proposed end-to-end model controls the generated speech's emotion by introducing additional emotion attributes based on the context-aware mask prediction network. Secondly, to prevent emotional interference from the original speech, a neutral content generator is proposed to remove the emotional components, which is optimized using the generative adversarial framework. Thirdly, two data augmentation methods are proposed to enrich the emotional and pronunciation information in the training set. Experimental results1 show that Emo-CampNet effectively controls the generated speech's emotion and can edit the speech of unseen speakers. Ablation experiments further validate the effectiveness of emotional selectivity and data augmentation methods.
WOS关键词VOICE CONVERSION ; RECOGNITION
资助项目Scientific and Technological Innovation Important Plan of China[2021ZD0201502] ; National Natural Science Foundation of China (NSFC)[62322120] ; National Natural Science Foundation of China (NSFC)[62306316] ; National Natural Science Foundation of China (NSFC)[61831022] ; National Natural Science Foundation of China (NSFC)[U21B2010] ; National Natural Science Foundation of China (NSFC)[62101553] ; National Natural Science Foundation of China (NSFC)[61971419] ; National Natural Science Foundation of China (NSFC)[62006223] ; National Natural Science Foundation of China (NSFC)[62206278]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:001178376400001
出版者ELSEVIER
资助机构Scientific and Technological Innovation Important Plan of China ; National Natural Science Foundation of China (NSFC)
源URL[http://ir.ia.ac.cn/handle/173211/57942]  
专题自动化研究所_模式识别国家重点实验室_模式分析与学习团队
通讯作者Yi, Jiangyan; Fu, Ruibo; Tao, Jianhua
作者单位1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
3.Tsinghua Univ, Dept Automat, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Wang, Tao,Yi, Jiangyan,Fu, Ruibo,et al. Emotion selectable end-to-end text-based speech editing[J]. ARTIFICIAL INTELLIGENCE,2024,329:16.
APA Wang, Tao,Yi, Jiangyan,Fu, Ruibo,Tao, Jianhua,Wen, Zhengqi,&Zhang, Chu Yuan.(2024).Emotion selectable end-to-end text-based speech editing.ARTIFICIAL INTELLIGENCE,329,16.
MLA Wang, Tao,et al."Emotion selectable end-to-end text-based speech editing".ARTIFICIAL INTELLIGENCE 329(2024):16.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。