中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis

文献类型:会议论文

作者Wang T(汪涛)
出版日期2022-10
会议日期2022
会议地点Online
英文摘要
End-to-end singing voice synthesis (SVS) is attractive due to the avoidance of pre-aligned data. However, the auto learned alignment of singing voice with lyrics is difficult to match the duration information in musical score, which will lead to the model instability or even failure to synthesize voice. To learn accurate alignment information automatically, this paper proposes an end-to-end SVS framework, named Singing-Tacotron. The main difference between the proposed framework and Tacotron is that the speech can be controlled significantly by the musical score’s duration information. Firstly, we propose a global duration control attention mechanism for the SVS model. The attention mechanism can control each phoneme’s duration. Secondly, a duration encoder is proposed to learn a set of global transition tokens from the musical score. These transition tokens can help the attention mechanism decide whether moving to the next phoneme or staying at each decoding step. Thirdly, to further improve the model’s stability, a dynamic filter is designed to help the model overcome noise interference and pay more attention to local context information. Subjective and objective evaluation 1 verify the effectiveness of the method. Furthermore, the role of global transition tokens and the effect of duration control are explored.
源URL[http://ir.ia.ac.cn/handle/173211/52362]  
专题自动化研究所_模式识别国家重点实验室_模式分析与学习团队
作者单位Institute of Automation, Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Wang T. Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis[C]. 见:. Online. 2022.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。