中国科学院机构知识库网格系统: 统计参数语音合成中的关键技术研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

统计参数语音合成中的关键技术研究

文献类型：学位论文


作者	孙敬伟
学位类别	博士
答辩日期	2009-05-23
授予单位	中国科学院声学研究所
授予地点	声学研究所
关键词	语音合成统计参数语音合成段模型多项式段模型条件随机场韵律结构预测
其他题名	Key Technology Research on Statistical Parametric Speech Synthesis
学位专业	信号与信息处理
中文摘要	随着电子计算机的运算速度和存储能力的迅猛发展，语音合成技术已由早期的知识驱动发展到数据驱动阶段，大规模语料库的支持使得合成语音的质量有了明显的改善。与此同时，用户对语音合成系统提出了更高的要求，尤其是多语种、音色可变、富于情感的合成。传统的基于单元挑选的拼接合成由于系统构建周期长、存储消耗大、灵活性差等缺点无法满足多样化合成的需求。在这种背景下，基于统计建模的参数合成逐渐引起人们的重视。统计参数合成通过自动训练的方式进行合成系统构建，需要的人工干预少，并且能深入语音参数层面进行处理，灵活性好，有着很高的理论和实用价值。本文在前人的基础上，对统计参数合成方法进行了深入而系统的研究，在合成前端和后端都做出了改进，并通过实验对新方法的优势及合理性进行了论证。本文的具体研究工作和研究成果如下： 1、在HTS框架的基础之上，针对中文特点，设计并完成了基于隐马尔可夫模型（HMM）的中文语音合成系统，并进行了若干性能改进。 2、在汉语韵律结构预测方面，深入分析了与汉语韵律变化相关的特征，考察韵律预测任务的特点，在此基础上利用条件随机场（CRF）进行韵律建模，实现了基于条件随机场的韵律结构预测方法。 3、在声学参数建模方面，使用多项式段模型进行语音参数建模，实现了新的基于动态规划的多项式段模型快速切分训练算法，在多项式段模型的框架下，对基频、频谱、时长三种语音参数进行统一建模。 4、在参数生成方面，对基于多项式均值轨迹的语音参数生成算法进行研究，根据模型进行各种语音参数的重建。 5、构建了一个基于多项式段模型的统计参数语音合成系统。实验结果证明了上述方法的研究价值及有效性。
英文摘要	With the increase in the power and resources of computer technology,building natural-sounding synthetic voices has progressed from a knowledge-based approach to a data-based one. Quality and naturalness of synthetic speech have been improved significantly with the support of large-scale speech corpus. Meanwhile, people propose more requirements for the text-to-speech (TTS) system, especially the requirements for synthetic voices of various languages, voices and emotions. The corpus-based concatenative speech synthesis technique, which is most popularly used in the current TTS systems, has some shortcomings, such as the long time system construction circle, large footprints and little flexibility, so it can not meet the needs for the variety of speaking styles. In this background, statistical parametric speech synthesis method attracts people's attention. This method can build TTS system by automatic training without large costs on handcraft operations. Meanwhile, it can step into the speech parameter layer for speech analysis and adjustment to get more flexibility of synthetic voices. In this thesis, statistical parametric speech synthesis system was deeply and systematically studied on the basis of previous research, new methods were proposed both in the front end and the back end of the system, and the rationality of these methods were confirmed by experiments. The key technical improvements and research works in this thesis are as follows: Firstly, we built an HMM-based speech synthesis system for mandarin Chinese on the basis of HTS platform and made some improvements on it. Secondly, in the field of Chinese prosody structure prediction, we deeply analyzed the features related to Chinese prosody variation, built prosody models using Conditional Random Fields (CRF), used CRFs for prosody structure prediction. Thirdly, in the fields of speech modeling, we tried to model speech parameters using Polynomial Segment Model (PSM), realized a novel training algorithm for Polynomial Segment Model based on the dynamic programming idea, so as to unify pitch, spectrum and duration parameters under the PSM framework. Fourthly, we realized a new speech parameter generation algorithm using polynomial mean trajectories, reconstructed speech parameters from Polynomial Segment Models. Finally, we constructed a statistical parametric speech synthesis system based on Polynomial Segment Model. The experiment results have confirmed the value of this research and the effectiveness of the new method.
语种	中文
公开日期	2011-05-07
页码	99
源URL	[http://159.226.59.140/handle/311008/556]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	孙敬伟. 统计参数语音合成中的关键技术研究[D]. 声学研究所. 中国科学院声学研究所. 2009.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。