中国科学院机构知识库网格系统: 实时语音识别系统的快速算法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

实时语音识别系统的快速算法研究

文献类型：学位论文


作者	谢凌云
学位类别	博士
答辩日期	2004
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	语音识别搜索算法分段动态闭值剪枝特征分量屏蔽有效高斯分布子集
其他题名	Research on Fast Algorithm of Real-Time Speech Recognition System
中文摘要	实验室环境下的语音识别算法已经基本成熟，基于隐含马尔可夫模型（HMM）的非特定人连续语音识别系统成为主流，在标准发音和安静环境的条件下可以取得令人满意的识别正确率。但是，应用于小型便携的嵌入式产品，基于HhIM的非特定人连续语音识别系统还存在识别性能与识别速度、识别性能与内存消耗的尖锐矛盾，成为当前语音识别应用的技术瓶颈之一。其中，识别网络的路径搜索和HMM声学模型的输出概率计算成为系统计算资源的最大负担。在此背景下，本论文研究非特定人连续语音识别系统的快速算法。本论文的主要工作如下：1、在实时嵌入式系统上建立了基于HMM的非特定人大词汇表连续语音识别的实验平台，分别进行了无文法约束的汉语全音节网络识别和乘出租车对话语句识别的基线系统的实验。2、提出了分段动态闭值剪枝快速搜索算法，利用语音帧在整个语音段所处的相对位置对路径数目和路径概率得分的影响，以及当前激活模型数目对剪枝闽值的影响，在识别搜索的过程中动态调整剪枝的闺值。在无文法约束的汉语全音节网络识别基线系统和乘出租车对话语句识别的基线系统上的实验结果表明，与传统剪枝搜索算法相比较，该算法在保持相同甚至略高的识别率的情况下，搜索时平均每帧产生的路径数目分别降低3．38％和8％以上，实时处理时间分别下降21.15％和10.18％以上，有效地加快了搜索速度，减少了内存空间消耗。3、提出了基于特征分量屏蔽的高斯概率快速算法。高斯概率计算是语音识别中进行算法优化时的首选对象。传统的优化方法利用矢量量化对声学空间进行聚类，本论文从特征系数分量贡献度这个新的角度，对高斯概率计算中的各特征系数分量进行筛选，只计算其中贡献度突出的特征系数分量的表达式。在乘出租车对话语句识别的基线系统上的实验结果表明，该算法将单高斯分布基线系统的计算时间下降10％以上，而识别率同时略升0.72％左右；将6高斯混合分布基线系统的计算时间下降6.71％以上，识别率只是略降0.07％，有效地减少了高斯概率计算量，提高了识别速度。该算法也能和剪技搜索算法同时使用，在不同闽值下的计算时间能够再下降0.6％-2.6％，而识别率也获得0.2％-4％的提升，说明了该算法与其他的快速算法有着良好的兼容性。4、提出了基于有效高斯分布子集的高斯概率快速算法。传统的高斯概率快速算法需要重新训练声学模型以获得供计算的高斯分布子集，本论文直接对高斯分布进行模式聚类，形成有效高斯分布子集，并用子集元素加权和的形式来表征其余的高斯分布，无需再对声学模型进行训练。在乘出租车对话语句识别的基线系统上的实验结果表明，该算法能够将基线系统的计算时间下降17％以上，同时识别率略降1.3％左右，声学模型的存储空间也有所下降，有效地减少了计算复杂度和存储空间，同时识别率得到了良好的控制。
英文摘要	The HMM-based speaker-independent continuous speech recognition systems can get good accuracy with standard pronunciation and quiet environment. The small-vocabulary speech recognition systems based on model matching are applying on personal portable platforms such as mobile phone, PDA and tabletPC. Real-Time embedded systems are the primary platforms for such applications. But in fact, the embedded systems' ability is not enough to run HMM-based speaker-independent continuous speech recognition algorithms, especially the search algorithm and the state likelihood computation. The embedded systems can't afford enough float-point computation ability. So the speech recognition algorithm should be optimized first when it is transplanted to real-time embedded systems. Based on that, this thesis studied the fast algorithms for HMM-based speaker-independeni continuous speech recognition systems. The main contributions of this thesis are: IN The HMM-based speaker-independent continuous speech recognition system is transplanted to real-time embedded system. This embedded speech recognition system is the experiments' platform for the research of fast algorithms. 2 This thesis studied the effect of the frame's location and the active models' number, to the priming thresholds and recognition accuracy. A fast segment-based dynamic pruning algorithm is proposed. The experiment results show that, comparing to the traditional search algorithm of full syllable baseline system and taxi-dialog baseline system, this algorithm can get more than 3.38% and 8% reduction of path number per frame and more than 21.15% and 10.18% reduction of the recognition time with keeping the recognition accuracy. The time complexity and space complexity of search algorithm are reduced efficiently. 3 This thesis studied the algorithms for fast calculation of state likelihood of HMM models, which makes up the most proportion of the computational load of speech recognition. Traditionally the fast calculation algorithms divide ihe acoustic space into several classes with vector quantization. This thesis proposed a new fast calculation algorithm based on feature component masking, with focusing on the contribution ability of feature component. The experiment results show that, comparing with the single-mixture taxi-dialog baseline system, the computation time can be reduced more than 10%, and the recognition accuracy can increase by about 0.72%. The experiment results on the 6-mixture taxi-dialog baseline system show that, the computation time can be reduced more than 6.71%, and the recognition accuracy only decreases by 0.07%. The gaussian computation complexity and the recognition time are both reduced. This algorithm also has good compatibility with other fast algorithms such as pruning search algorithm. The experiment results show that, working together with the pruning search algorithm, the computation time decreases by 0.6%-2.6% and the recognition accuracy increases by 0.2%-4%. 4 This thesis also proposed another new fast calculation algorithm based on effective gaussian shortlist. Unlike, traditional algorithms; this fast calculation algorithm need not retraining the acoustic models. The effective gaussian shortlist can be extracted directly by using pattern clustering on all the gaussian components in this new algorithm. Other gaussian components are expressed by the weight sum of those components which are in the shortlist. The experiment results show that, comparing with the taxi-dialog baseline system, the computation time can be reduced more than 17%, and recognition accuracy decreases a little by about 1.3%. The storage space for acoustic models also decreases.
语种	中文
公开日期	2011-05-07
页码	95
源URL	[http://159.226.59.140/handle/311008/816]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	谢凌云. 实时语音识别系统的快速算法研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2004.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。