中国科学院机构知识库网格系统: 汉语语音识别系统及语音单元的研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

汉语语音识别系统及语音单元的研究

文献类型：学位论文


作者	何晓冬
学位类别	博士
答辩日期	1999
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	语音识别隐马尔可夫模型模型初始化语言模型语音单元建模
中文摘要	本文在经过对汉语语音的大量而细致的分析，以及对隐马尔可夫模型的理论和实践进行比较和研究的基础上，实现了一个不认人汉语连续语音识别系统，并对一些训练和识别的问题进行了深入的探索。系统的训练采用分段K平均训练方法，而模型的初始化是其中一个较关键的问题。通过实验，作者发现模型的初始化方法对状态转移概率矩阵A有较大影响。在某些情况下，这会造成识别结果中产生很多插入错误。实验还表明，在状态输出概率矩阵B训练得比较稳定后再训练状态转移概率矩阵A，一般会取得最佳结果。作者对在纯声学层次上的一些识别问题，如识别搜索算法、模型间转移的概率补偿等，进行了讨论。在此基础上，作者还对语言模型在语音识别中的应用进行了初步探讨。实验显示，仅仅导入简单的bi-gram语言模型就可把正识率提高超过10个百分点。作者同时也提出一种基于动态规划的序列规整算法来对语音识别结果进行统计。在上述工作基础上，作者又进一步实现了“不认人可扩展词表汉语孤立词识别系统”与“汉语连续语音地名对话查询系统”两个系统。在前一个系统中，作者对把连续语音识别的策略应用于口令识别系统进行了研究。这样既能使得系统有良好的扩展性，又能保证很高的识别率。实验表明，在500词的小词表识别中，识别率高于99％，在1000词的中词表识别中，识别率也接近95％。在后一个系统中，作者提出一种基于规整有限状态网的搜索算法，以保证系统的可扩展性和正确率。初步实验结果表明其平均识别率达95％。最后，作者对语音识别单元的建模问题进行了进一步的研究。通过对语音基本单元的DHMM的参数进行分析，作者提出一种基于模型分裂的语音单元建模方法。文中给出了具体的语音基本单元分裂步骤。并解决了相应的词典建立、模型训练等问题。实验结果显示，该方法使语音基本单元的平均识别率提高了近10个百分点
英文摘要	After comprehensive analysis of Chinese speech and Hidden Markov Model (HMM), a speaker independent Chinese continuous speech recognition system was implemented, and some issues about training and recognition were discussed. Segmental K-Means training was employed in the system development. In this case, initialization of HMM is an important issue. Experimental results indicated that initialization of HMM can influence state-transition probability distribution A greatly. In some cases, this will result in many insert errors in the recognition result. Experimental results also revealed that the best result could be obtained if observation symbol probability distribution B was trained before state-transition probability distribution A. Some issues about speech recognition on the acoustic level, such as speech decoding, probability compensation for transition between two models, were studied. Moreover, the application of language model was also explored. Experimental results suggested that even a simple bi-gram language model would help us to gain a more than 10% improvement on correct recognition rate. Besides the above, a dynamic programming based sequence-warping algorithm was proposed to count the recognition result. Based on the description above, two systems, "Speaker Independent Vocabulary Expandable Chinese Speech Recognition System" and "Chinese continuous speech query system for locations", were implemented. In the former system, considering the flexibility and accuracy, the continuous speech recognition strategy was applied for command recognition system. Experimental results showed a high correct recognition rate of 99% for a 500-word vocabulary, and 95% for a 1000-word vocabulary. In the latter system, a normalized Finite State Network (FSN) based decoding algorithm was proposed which performed a correct recognition rate of 95%. In the end, a further investigation was made for speech unit modeling. Based on the analysis of parameters of DHMMs of speech units, a model splitting based speech unit modeling method was presented. The detailed approach was described, and the relevant issues, such as lexicon building and model training, were expressed. Experimental results showed that nearly 10% improvement of the correct recognition rate of speech units was achieved.
语种	中文
公开日期	2011-05-07
页码	63
源URL	[http://159.226.59.140/handle/311008/640]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	何晓冬. 汉语语音识别系统及语音单元的研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 1999.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。