汉语单音节、两音节组和三音节组基频曲线建模方法研究
文献类型:学位论文
作者 | 李香春 |
学位类别 | 博士 |
答辩日期 | 2003 |
授予单位 | 中国科学院声学研究所 |
授予地点 | 中国科学院声学研究所 |
关键词 | 语音合成 基音检测 韵律模型 |
其他题名 | Research on F0 Curve Modeling for Monosyllables Disyllables and Trisyllables in Mandarin Chinese |
中文摘要 | 本文的研究工作致力于探索一种供参数合成法使用的,能够提供接近自然语言的汉语单音节、两音节组和三音节组的基频曲线建模方法。基于863语音合成语料库,主要研究了汉语单音节、两音节组和三音节组中辅音音段清浊特性和声调间的搭配对超音段特征F0的影响及其表现形式,并建立了汉语单音节、两音节组和三音节组基频曲线的产生模型。同时,针对大量语料的基频数据获取和基频曲线动态表示两大难点,提出了一种检测基音的加权求和算法和一种基频曲线动态变化的描述方法。本研究的主要成果和创新工作如下:1.提出了一种基于多尺度边缘特征提取的基音检测算法(加权求和算法)。通过提取三个尺度空间小波变换系数加权和的局部极值点,该算法能够获得突变点的准确位置。这样以来,降低了因大尺度滤波器的平滑作用引起的漏警率和虚警率,提高了检测精度,同时还提高了突变点与伪突变点之间的幅度差,具有很好的抗噪性。2.提出了一种基频曲线动态变化的描述方法。通过判断基频曲线微分表达式有效零点的个数及位置,该方法可对基频曲线进行自动分段。每段基频曲线只用一个直角三角形来表示,而每个直角三角形的参数只有两个(时长和斜率)。在这种基频曲线分段模型中,时长和斜率两个参数都具有明确的物理意义,能够方便清晰地描述基频曲线的走向。3.用真实语料的基频曲线建立了单音节基频曲线模型库。研究发现辅音音段不同的清浊特性对相同声调的单音节基频曲线变化有显著的区别性影响:对于大部分以浊声母开头的单音节阴平和去声而言,其基频曲线通常以上升开始;而对于大部分以清声母开头的单音节阴平和去声而言,其基频曲线通常却以下降开始;而且,比较以浊声母开头和以清声母开头的阳平和上声,发现它们基频曲线起始部分的变化也存在明显不同。根据上述特征,、建立了一8类单音节基频曲线:的产生模型杯汉语单音节的听侧试验结果表明针与基于公式计算的方法相比亩本方酬单音节翩正确识别率提高了15个百分点。4.用真实语料的基频曲线建立了汉语两音节组基频曲线模型库。根据两音节组的声调搭配、末字辅音音段的清浊特性对两音节组基频曲线影响的研究结果,建立了32类两音节组基频曲线的产生模型。该模型库中的基频曲线全部选自863语音合成语料库。这样,在基频曲线的生成过程中无需考虑内部签频曲线的走向和音节间的过渡,保证了两音节组内部每个音节的丛频曲线和音节间的过渡都平滑自然。汉语两音节组的听测试验结果表明,与基于公式计算的方法相比,本方法的两音节组声调正确识别率提高了6个百分点。5.用真实语料的基频曲线建立了汉语三音节组墓频曲线模型库。研究表明,可以根据三音节组不同的声调组合、中字和末字辅音音段的洁浊特性对拼于断节组基频曲线分类。我们把每种声调组合的基频曲线分为4类,三音节组基频曲线的变化模式共分为256类。同时,从863语音合成语料库中选择能代表嚣音书组荃频曲线典型变化的真实基频曲线建模。汉语三音节组的听测试验结采表明,与签于公式计算的方法相比,本方法的三音节组声调的正确识别率提i苟了4个百分点,音节之间的过渡平滑,整个组合的声调接近自然语音。汉语单音节、两音节组、三音节组听测试验的综合结果表明:与摧于公式计算的基频曲线相比,采用本方法的合成语音的自然度提高了。.6分,达到5级Mos得分的3.8分,接近良的标准;音节清晰度提高了3.47%,音书的蒸确识别率达82.88%。 |
英文摘要 | This thesis mainly aims to build a FO contour modal for parameter synthesis of Mandarin, which can be used to synthesize monosyllables, disyllables and trisyllables more naturally. Based on the influence of the voiced/unvoiced attribute of consonants and the tone combinations on suprasegments, we modeled a FO contour database for monosyllables, disyllables and trisyllables. All the FO data are from the real voice in the 863 speech synthesis database. In this process, we put forward a pitch detection method (Weighted Summing Method) to obtain the accurate pitch period and an automatic description method to describe the dynamic variation of the FO contour. The main innovative work carried out in this thesis and the obtained results are as follow: 1. A new pitch detection method is put forward, based on multiscales edge feature extraction(Weighted Summing Method). This method can detect the exact glottal closure time by extracting the local maxima in the weighted sum of wavelet transform coefficients in three scales. In this method, the amplitude of the false break point is small with respect to that of the real break point, so it can work well even in the noise environment. At the same time, it decreases the false alarming rate and the fail alarming rate resulting from the smoothing effects of the filters in the large scales. 2. A new dynamic FO contour description method is brought forward. The segmentation can be done automatically depending on the number and the positions of extreme points in the polynomial fitting expression of the FO contour. The only parameters of each segment are its slope and length, which carry specific meanings and can describe the variation of the FO contour sufficiently and easily. A FO contour database for monosyllables is built, based on the relation between voiced/unvoiced consonants and suprasegments. According to our research, the FO contours of most monosyllables with voiced consonants in 1st tone and 4th tone begin upward, and the FO contours of most monosyllables with unvoiced consonants in 1st tone and 4th tone begin downward. In the 2nd tone and 3rd tone, the FO contours of the monosyllables starting with voiced consonants and those with unvoiced consonants also begin very differently from each other. In terms of these distinctive features, the FO contours of monosyllables are classified into-8 classes, and a FO contour database for monosyllables is built. Listening tests for monosyllables confirm that the right recognition rate for monosyllables based on this FO contour modal is improved by 15 percent with respect to the one based on formula calculation. 4. A FO contour database for disyllables is built. The FO contour has a close relation with the tonal coarticulation pattern and the voiced/unvoiced attribute of the consonant segment in the last syllable. According to this, we classify the FO contours of disyllables into 32 classes and build the FO contour database using materials from the real speech. Our listening tests for disyllables confirm that the right recognition rate for disyllables based on this modal is improved by 6 percent with respect to the one based on formula calculation. A FO contour database for trisyllables is built, based on the study on the factors associated with the variation of the FO contour. The FO contours of trisyllables are classified into 256 classes and the FO contour database is built using the pitch data from the 863 database for speech synthesis. Our listening tests for trisyllables confirm the right recognition rate for trisyllables based on this modal is improved by 4 percent with respect to the one based on formula calculation. The transition between syllables is more natural than the pitch modal based on formula calculation in the trisyllables. The listening tests for monosyllables, disyllables and trisyllables confirm that the naturalness of synthesized speech produced by this modal is improved by 0.6 in the score with respect to the FO modal produced by formula calculation, and the absolute1 score of MOS (5 classes) is 3.8(nearly the score of good). Syllable articulation is improved by 3.47 percent and the absolute score is 82.88. |
语种 | 中文 |
公开日期 | 2011-05-07 |
页码 | 98 |
源URL | [http://159.226.59.140/handle/311008/1032] ![]() |
专题 | 声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文 |
推荐引用方式 GB/T 7714 | 李香春. 汉语单音节、两音节组和三音节组基频曲线建模方法研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2003. |
入库方式: OAI收割
来源:声学研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。