中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
CONTEXT-AWARE MASK PREDICTION NETWORK FOR END-TO-END TEXT-BASED SPEECH EDITING

文献类型:会议论文

作者Wang T(汪涛)
出版日期2022-09
会议日期2022
会议地点Online
英文摘要
The text-based speech editor allows the editing of speech through intuitive cutting, copying, and pasting operations to speed up the process of editing speech. However, the major drawback of current systems is that edited speech often sounds unnatural and it is not obvious how to synthesize records according to a new word not appearing in the transcript. This paper proposes a novel end-to-end text-based speech editing method called context-aware mask prediction network (CampNet), which avoids the unnatural phenomenon caused by cut-copy-paste operation in the traditional method and can synthesize a new word not appearing in the transcript. Besides, three text-based speech editing operations based on CampNet are designed: deletion, replacement, and insertion. These operations can comprehensively cover different kinds of situations that text-based speech editing can face. The subjective and objective experiments on VCTK and LibriTTS data sets show that the speech editing results based on CampNet are better than TTS technology, manual editing, and VoCo method (the combination of speech synthesis and speech conversion). We also conducted detailed ablation experiments to explore the effect of the CampNet structure on its performance. Examples of generated speech can be found at
https://hairuo55.github.io/CampNet-demo.
源URL[http://ir.ia.ac.cn/handle/173211/52365]  
专题自动化研究所_模式识别国家重点实验室_模式分析与学习团队
作者单位Institute of Automation, Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Wang T. CONTEXT-AWARE MASK PREDICTION NETWORK FOR END-TO-END TEXT-BASED SPEECH EDITING[C]. 见:. Online. 2022.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。