中国科学院机构知识库网格系统: 汉语连续语音识别及连续汉语的声调识别研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

汉语连续语音识别及连续汉语的声调识别研究

文献类型：学位论文


作者	刘建
学位类别	博士
答辩日期	1996
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	连续语音识别声调识别高斯混合声调模型声调聚类一映射识别基音提取面积差函数
中文摘要	语音识别一直以来都是研究的热点之一，特别是近期以来，汉语语音识别更是引起了越来越广泛的重视。本文介绍了作者在三年博士研究生期间对汉语语音识别以及汉语连续语音中的声调识别问题的一些研究结果。本文首先给出了一个基于混合连续高期概率密度函数的HMM不认人连续语音识别基线系统。通过总结以往研究工作的经验，结合汉语语音的知识，本文提出了将易误识的短声母和浊母根据其后续母发音类别的不同来细化声母模型的方法，这样总共得到74个识别单元，只比标准的声韵母总数（60个）多14个，但模型精度却有了较大提高。针对系统中插入错误过多的现象，本文还建立了一个连呼吸字串的识别系统，用以比较研究各种搜索算法，最终找到了一个在声学模型下幅度降低系统插入错误的方法。该基线系统的首选音节识别率大约比相同条件下的离散HMM系统高出10%，并具有相对较低的插入和删除错误率。另外，本文对汉语语音的音调检测和声调识别方面的研究都做了新的尝试。首先我们对单呼音节的基音提取和声调识别分别提出了面积差函数法（ADF）和高期混合声调模型（GMTM）识别方法；继而基于已建立的连续语音识别基线系统的识别结果和切分路径，提出了适合于连续语流中基音提取的改进算法，并将GMTM引入连续语音的声调识别，取得了约60%的首选声调正识率。在此基础上，通过对连续汉语变调规律的研究和总结，提出了适合于连续汉语声调识别的聚类-映射识别方法。它首先将所有归一化的基音数据通过聚类得到8个中心，每个中心各代表一种新调型；然后，再将新调型通过规则映射回原有的四声和轻声。该方法在同样的测试数据上取得了大于70%的首选声调正识率和接近90%前二选声调正识率以及声调识别结果的可信度表示。本文采用的这种连续语音的声调识别策略，使得声调识别方法能和语音识别系统方便且紧密地结合在一起，从而使“声、韵、调”直接地结合到了一起。同时，基于GMTM的声调识别方法并不只限于对汉语的声调识别，它可以适用于任意一种具有有限声调模式的有调语音。本文最后还通过实例简要说明了声调识别结果对自然语言理解的作用，介绍了本文的声调识别方法同HMM识别系统训练过程中LSS算法相结合的一种在连续语音基音和声调自动标注上的应用，并对本文工作的进一步研究方向和应用前景进行了探讨和展望。
英文摘要	Speech recognition has always been one of the research focuses in signal processing area. Many researchers have been attaching more and more importance to Chinese speech recognition, especially in the recent years. This paper introduces several results we have achieved on Chinese continuous speech recognition and continuous Chinese tone recognition during the last three years. First, a speaker-independent continuous speech recognition baseline system based on mixture Gaussian density Hidden Markov Model is presented. Based on our former research experience and Chinese phonetics knowledge, we refined those short consonants and voiced consonants, which are easy to be misrecognized, according to the different pronunciation ways of their consequent vowels, thus totally 74 speech recognition units was presented. Although it is only 14 more than the number of the standard initial parts and final parts of Chinese, the precision of the model units has been improved. Since many insert errors had been found in the baseline system, we also build a recognition system for connected digits for comparatively studying the different search algorithms. Finally, a method that can remarkably reduce the insertion error rate under the acoustic model is suggested. The top-one syllable accuracy of our baseline system is about 10% more than the discrete HMM recognition system under the same conditions with a relatively lower insertion and deletion error rates. This paper also introduces our new attempt on pitch extraction and tone recognition of Chinese speech. We first present Area Difference Function (ADF) method for pitch extraction and Gaussian Mixture Tone Model (GMTM) for tone recognition of Chinese syllables. Then based on the recognition result and the time alignment segmentation of the established baseline recognition system, we modify the pitch extraction method to fit in with the continuous speech and introduce GMTM into continuous Chinese tone recognition. This achieves 60% top tone recognition accuracy. By analyzing the rule of sandhi in continuous Chinese speech, we further put forward a more suitable method - tone Cluster-Mapping (CM) technique - for spoken Chinese tone recognition. "Cluster" in the first phase, this step classifies all normalized pitch vectors to 8 centers, which represents 8 new generated tone modes respectively. "Mapping" is the second phase, this step maps the different new tone modes to the four original tone modes and one 'light' tone mode with a probabilistic representation of the mapping reliability. This CM method achieves above 70% top tone accuracy and almost 90% top-two tone accuracy on the same test data with the capability of providing extra reliability of the tone recognition results. The strategy of tone recognition used here makes the combination of tone recognition technique and the HMM based speech recognition system more easily and conveniently and directly makes the three basic components, "initial-part", "vowel part" and "tone", of a Chinese syllable together in the output result. Furthermore, GMTM can not only be used for Chinese tone recognition, it can also be used for any tonal language which contains a limited number of tone classes. At the end of this paper, several experiments are presented to illustrate the influence of tone result on the language understanding. A new application of our tone recognition method together with the LSS algorithm of the HMM training procedure on automatic pitch and tone labeling for continuous speech is also explained. Further research directions and application prospects are discussed.
语种	中文
公开日期	2011-05-07
页码	95
源URL	[http://159.226.59.140/handle/311008/622]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	刘建. 汉语连续语音识别及连续汉语的声调识别研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 1996.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。