中国科学院机构知识库网格系统: 汉语语音合成系统的基频建模和优化

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

汉语语音合成系统的基频建模和优化

文献类型：学位论文


作者	刘浩杰
学位类别	博士
答辩日期	2005
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	语音合成韵律模型基频曲线建模优化
中文摘要	基于规则的语音合成系统在可懂度方面已经达到可以接受的程度，在自然度方面离人们的期望还有一定的距离。基频建模是提高汉语规则语音合成系统自然度所面临的主要挑战之一。本论文以自然语流的真实基频为研究对象，分别从反演和正演的角度，提出了新的基频模型并实现了韵律块基频曲线的优化。论文首先从反演的角度，对连续语流的基频曲线进行分解，．提取了各个音节的高低音线，然后据此对理论推导的基频模型的参数通过相应的优化方法进行估计，得到量化的基频模型。为了消除音节拼接处的基频间断，本文又从正演的角度，在韵律块的基频曲线中融入发音速度、重音强度和发音实体等语境因素以及发音的限制性要素，实现韵律块基频曲线的整体优化。这种对基频曲线的正演和反演在很大程度上也解决了以前只能从感性认识的角度把握高低音线等抽象韵律参数的缺陷，为韵律模型的进一步完善提供了新的方法。本论文的主要工作归纳如下：1）基于声调聚合的单音节音域研究。本文对863语音合成语料库某男发音人的所有单音节的音域利用声调聚合的方法进行了研究。研究结果表明，单音节的音域相对稳定；音域不受辅音清浊和元音所属类别的影响，只与音节本身特殊性质及发音的随机性有关：高低音线相互独立，并与阴平及阳平的平均值有明显的线性关系。2）基于MMSE准则的基频建模针对特定的连续语流，以其各个音节所对应的孤立单音节的基频曲线为模板基频曲线，以其本身各个音节的基频曲线为音域调整后的基频曲线，通过两类基频曲线的匹配，利用MMSE准则，即可得到连续语流中各个音节的高低音线。本文提出了新的基频模型，即连续语流的音域的控制是“大波浪”、“小波浪”和基准值以对数叠加形式施加于基本声调模式的联合作用。借助于基于MMSE准则得到的高低音线，利用优化搜索方法对模型的各个参数进行了估计。模型的参数都有明确的物理意义，分别反映了不同的语音学特征，可以针对具体的语境做灵活的调整。3）汉语韵律块基频曲线的优化及规则本文参考Stem-ML标注体系的基频曲线生成数学模型，提出了汉语韵律块基频曲线优化的思想以及基于犷估计的实现方法，对由孤立单音节基频曲线串接而成的汉语韵律块的基频曲线的连续性、平滑性、曲线形状、平均值进行整体优化，使之在发音的准确程度和费力程度之间保持相对的平衡。进而，本文对单音节、双音节、三音节韵律块的聚类后的基频曲线进行了反演分解，提取了各个音节的平滑因子、形状失真度、重音强度以及高低音线等优化相关参数，并分别按照音节在韵律块中的位置因素和声调因素，对优化相关参数进行了统计分析，得到这些参数在合成系统中的具体应用规则。对三音节韵律块的听测试验的结果表明，合成系统在韵律块基频曲线优化前后的清晰度分别为3.25和3.35；自然度分别为2.9和3.31。4）基于基频曲线优化的基频建模基于基频曲线优化的相关理论，本文提取了若干连续语流的高低音线等优化相关韵律参数，建立了新的基频模型。对连续语流的听测实验表明，合成系统应用原模型与新模型的合成语音的清晰度分别为3.63和3.8；自然度分别为3.12和3.69。
英文摘要	For the rule-based speech synthesis system, the intelligibility is acceptable, while the naturalness is not good enough to meet the people's anticipation. The FO model is one of the main challenges to improve the naturalness of this system. Based on the research of the actual FO contour in natural utterance, this paper proposes a new FO model and achieves the optimization of FO contour in Chinese prosodic chunk, from the part of the forward analysis and the inverse analysis respectively. From the part of the inverse analysis, this paper first disassembled the actual FO contour in natural utterance and extracted the top-line and the bottom-line of every syllable. Then the parameters of the new proposed FO model are assessed using the associated optimized method. In order to remove the discontinuity of FO contour at the concatenation between two bordered syllables, this paper assembled the environmental factors, such as the articulate velocity, stress of syllable, articulator, etc., into the FO contour in Chinese prosodic chunk, which achieved the global optimization of the FO contour from the part of the forward analysis. The forward and inverse analysis for the FO contour can resolve the previous limitation of studying some abstract prosodic parameters, which provides a new method to perfect the prosody model.The main innovative works carried out in this thesis and the obtained results are followings: 1) The research on the pitch range of monosyllable based on the tone arrangement. Using the tone arrangement, this paper studies the pitch range of the entire monosyllable for one male speaker in the 863 Speech Synthesis Corpus. The results show that: (T)the pitch range of monosyllable is stabilize relatively; ?The pitch range is mainly related with the inherent property of the corresponding syllable and with the random of articulation, and has the little relation with the character of the vowel and the consonant; ?The top-line and the bottom-line are independent, and have a linear relation with the mean value of the Rise tone and the High tone. This is the foundation of the following works. 2) The FO Model Based on the MMSE Principle Given the specified natural utterance, if we assume the isolated FO contour of the corresponding syllable as the template FO contour, and the actual FO contour as the modulated FO contour by pitch range, we can get the top-line and the bottom-line of the entire syllable in this utterance, based on the MMSE principle using the template matching. Base on the physiological and the psychological mechanism, this paper also proposes a new FO model, in which the pitch range is the united function of the reference value, the global contour, and the local contour overlapped in logarithmic. Then we can assess the parameters of the FO model using the extracted pitch range. The parameters of this model have clear physical significations with the different phonetic characters, so we can adjust the parameters flexibly for the different environment of articulation. 3) The optimization of FO contour in Chinese prosodic chunk and the rules of the associated parameters Based on the mathematical model in the Stem-ML Labeling System, this paper proposes the idea to optimize the FO contour in Chinese prosodic chunk and the method to achieve the optimization. This optimization can balance physiological and communication constraints, which synthesize the character of continuity, smoothness, pitch contour and pitch mean into the FO contour in Chinese prosodic chunk. Further, this paper also analyses the clustered FO contour inversely for the monosyllable, disyllable, and tri-syllable. Through the statistical analysis, we can get the rules of the smoothness factor, the distortion, the stress and the pitch range for the different prosodic chunk. The listening test for the tri-syllable chunk shows that, the intelligibilities before and after the optimization in VSS system are 3.25 and 3.35, and the naturalness are 2.9 and 3.31 respectively. 4) The FO Model Based on the Optimization for Natural Utterance Base on the optimization of FO contour in Chinese prosodic chunk, this paper . extracts the parameters associated with the optimization, and build up the new F0 model. The listening test for the twenty natural utterance shows that, the intelligibilities of the model in the VSS system and the new FO model are 3.63 and 3.8 respectively; the naturalness are 3.12 and 3.69 respectively.
语种	中文
公开日期	2011-05-07
页码	108
源URL	[http://159.226.59.140/handle/311008/930]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	刘浩杰. 汉语语音合成系统的基频建模和优化[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。