Emotion selectable end-to-end text-based speech editing
文献类型:期刊论文
作者 | Wang, Tao1,2![]() ![]() ![]() ![]() ![]() |
刊名 | ARTIFICIAL INTELLIGENCE
![]() |
出版日期 | 2024-04-01 |
卷号 | 329页码:16 |
关键词 | Emotion selectable Text-based speech editing Emotion decoupling Mask prediction Few-shot learning Text-to-speech |
ISSN号 | 0004-3702 |
DOI | 10.1016/j.artint.2024.104076 |
通讯作者 | Yi, Jiangyan() ; Fu, Ruibo() ; Tao, Jianhua(jhtao@tsinghua.edu.cn) |
英文摘要 | Text-based speech editing is a convenient way for users to edit speech by intuitively cutting, copying, and pasting text. Previous work introduced CampNet, a context-aware mask prediction network that significantly improved the quality of edited speech. However, this paper proposes a new task: adding emotional effects to the edited speech during text-based speech editing to enhance the expressiveness and controllability of the edited speech. To achieve this, we introduce Emo-CampNet, which allows users to select emotional attributes for the generated speech and has the ability to edit the speech of unseen speakers. Firstly, the proposed end-to-end model controls the generated speech's emotion by introducing additional emotion attributes based on the context-aware mask prediction network. Secondly, to prevent emotional interference from the original speech, a neutral content generator is proposed to remove the emotional components, which is optimized using the generative adversarial framework. Thirdly, two data augmentation methods are proposed to enrich the emotional and pronunciation information in the training set. Experimental results1 show that Emo-CampNet effectively controls the generated speech's emotion and can edit the speech of unseen speakers. Ablation experiments further validate the effectiveness of emotional selectivity and data augmentation methods. |
WOS关键词 | VOICE CONVERSION ; RECOGNITION |
资助项目 | Scientific and Technological Innovation Important Plan of China[2021ZD0201502] ; National Natural Science Foundation of China (NSFC)[62322120] ; National Natural Science Foundation of China (NSFC)[62306316] ; National Natural Science Foundation of China (NSFC)[61831022] ; National Natural Science Foundation of China (NSFC)[U21B2010] ; National Natural Science Foundation of China (NSFC)[62101553] ; National Natural Science Foundation of China (NSFC)[61971419] ; National Natural Science Foundation of China (NSFC)[62006223] ; National Natural Science Foundation of China (NSFC)[62206278] |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001178376400001 |
出版者 | ELSEVIER |
资助机构 | Scientific and Technological Innovation Important Plan of China ; National Natural Science Foundation of China (NSFC) |
源URL | [http://ir.ia.ac.cn/handle/173211/57942] ![]() |
专题 | 自动化研究所_模式识别国家重点实验室_模式分析与学习团队 |
通讯作者 | Yi, Jiangyan; Fu, Ruibo; Tao, Jianhua |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 3.Tsinghua Univ, Dept Automat, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Wang, Tao,Yi, Jiangyan,Fu, Ruibo,et al. Emotion selectable end-to-end text-based speech editing[J]. ARTIFICIAL INTELLIGENCE,2024,329:16. |
APA | Wang, Tao,Yi, Jiangyan,Fu, Ruibo,Tao, Jianhua,Wen, Zhengqi,&Zhang, Chu Yuan.(2024).Emotion selectable end-to-end text-based speech editing.ARTIFICIAL INTELLIGENCE,329,16. |
MLA | Wang, Tao,et al."Emotion selectable end-to-end text-based speech editing".ARTIFICIAL INTELLIGENCE 329(2024):16. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。