中国科学院机构知识库网格系统: 个性化语音合成建模方法的研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

个性化语音合成建模方法的研究

文献类型：学位论文


作者	于剑
学位类别	工学博士
答辩日期	2007-05-29
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	陶建华
关键词	语音合成韵律建模隐马尔可夫模型对话语气韵律自适应 Prosody model hidden Markov model dialog speech prosody adaptation
其他题名	Research on the modeling in personalized speech synthesis
学位专业	模式识别与智能系统
中文摘要	传统的语音合成多侧重于单一朗读语气的研究，为了进一步提高语音合成系统的个性化表达，促进语音合成系统的应用前景，本文从语音合成韵律模型和声学建模方法入手，针对个性化语音合成研究涉及的韵律风格、口语化表达、音色自适应等方面，分别对基于依存关系的韵律模型、语音合成韵律自适应方法、对话语气韵律建模方法、基于混合隐马尔可夫模型的参数语音合成等内容进行了研究，研究成果对于进一步提高语音合成系统的表现力和个性化表达，促进对语音产生模型更为深入的理解，具有较好的意义。具体来说，本文共取得了如下主要研究成果：针对普通话连续语流中，相邻音节间在韵律特征上存在着强烈的互相依存关系的特点，本文对语音合成中韵律拼接代价函数给出了新的定义，使之可以精确描述相邻音节在基频曲线上的匹配程度，在此基础上建立了基于依存关系的韵律模型，较好地提高了语音合成输出的自然度。本文提出了一种与拼接语音合成系统紧密结合的个性化韵律自适应方法，基于一个或多个源说话人的大语料库和一个目标说话人的小语料库，可以为目标说话人构建一个新的韵律模型。该韵律模型不仅具有目标说话人的韵律特征，而且同时还具有源说话人语料库对上下文信息的完备覆盖，从而使合成系统达到对不同说话人说话风格的模拟。本文在对大量对话语料进行统计、分析的基础上，对对话语气中语气未完成现象进行了建模研究。由于对话语气语速较快和发音方式较随意，对话语气中很多音节没有完成其固有调形，从而引起基频曲线形状的变化。通过对未完成现象的建模，使得韵律模型可以输出具有对话语气韵律特征的基频曲线。为进行个性化语音合成中音色自适应问题，本文进一步实现了基于混合隐马尔可夫模型的语音合成系统。传统上，造成基于隐马尔可夫模型语音合成系统音质较差的原因来自于训练过程中的时域过平滑和频域过平滑等现象。本文提出了一种混合隐马尔可夫模型的方法来解决这两个问题，有效地提高了系统的表现力和清晰度。
英文摘要	Currently, most Text-To-Speech systems can only synthesize speech in a single style, which greatly limited the application of TTS system. For improving the expressiveness of the TTS outputs and enlarging the application of TTS system, this paper tries to study the prosody and spectrum modeling in personalized speech synthesis. Focusing on the prosodic and spectral style in personalized speech, this paper studies the prosody model based on mutual constraint, the prosody adaptation model in speech synthesis system, the dialog prosody model and the parametric speech synthesis system based on combined HMMs. The achievements of this paper are as follows: (1) The prosody model based on mutual constraint. This paper proposed and verified that there are strong mutual prosodic constraints between adjacent syllables in reading Mandarin speech. Based on these constraints, this paper presents a new definition of concatenation cost, which can precisely depict the naturalness between adjacent syllables. By minimizing the concatenation cost in the overall sentence, the pitch model can generate much more natural pitch contour. (2) The prosody adaptation in concatenation speech synthesis system. This paper presents a prosody adaptation method which is able to adapt the prosody model to a new style with a small training corpus. Based on one or several source corpuses, the new adapted prosody model has not only the target speaker’s prosody characteristics, but also complete coverage of contextual information of the source speaker. (3) The dialog prosody model. This paper presented a dialog prosody model. For complete that mission, the key point is to find the major difference between dialog pitch contour and read pitch contour. Based on many analysis and observations, this paper concluded that a major difference between dialog pitch contour and read pitch contour is the existence of the incomplete phenomenon. By simulating that phenomenon, the prosody model can output pitch contours with dialog style. (4) The parametric speech synthesis system based on combined HMMs. The HMM-based TTS system is a paramedic system which is presented recently. Although its high flexibility and low memory requirement, the speech quality of that system is not very well. To resolve that problem, this paper presents a combined HTS system which makes uses of both discrete HMMs and continuous HMMs. That system can resolve the over-smoothing problem in frequency domain and time domain which is encountered by conventional HTS system.
语种	中文
其他标识符	200518014628089
源URL	[http://ir.ia.ac.cn/handle/173211/5986]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	于剑. 个性化语音合成建模方法的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2007.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。