中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
随机段模型快速解码算法及其关键词检测研究

文献类型:学位论文

作者彭守业
学位类别工学硕士
答辩日期2009-05-30
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师刘文举
关键词语音识别 随机段模型 快速解码算法 关键词检测 speech recognition stochastic segment model fast decoding keyword spotting
其他题名Fast Decoding of Stochastic Segment Model and the Keyword Spotting System
学位专业模式识别与智能系统
中文摘要声学模型,是语音识别领域的核心研究方向之一。随机段模型(Stochastic Segment Model, SSM)放宽了隐马尔科夫模型(Hidden Markov Model, HMM)在给定状态时语音观测矢量相互独立的假设,获得了比HMM系统更高的识别性能,然而过高的计算复杂度,成为制约段模型实用的关键问题。本文针对段模型解码算法及其语音识别系统和关键词检测系统,进行的主要工作有: 1,出了基于相邻段的并行解码算法(Parallel Decoding of Neighboring Segments, PDNS),该算法改进了分步段计算方法,能够同时对多个语音段进行解码并剪枝。PDNS算法是一种局部解码方法,能为剪枝提供更高更精确的阈值,因而可以剪掉更多的不匹配模型。将该算法应用到LVCSR系统中,在基本不影响识别精度的前提下,节省了50%的计算时间。 2,于计算复杂性过高,在LVCSR中,段模型更多的是扮演一种辅助的角色。本文提出了基于HMM预切分的随机段模型重估算法,使用SSM对单音子HMM系统生成的lattice进行二次搜索,利用lattice内的节点信息和弧信息来更新SSM扩展集,以成倍加速段模型解码;对三音子HMM生成的N-Best进行重新验证,分别采用了固定边界得分、局部最大得分和整体最优得分的原则来重估每条路径,重估后的错误率相对HMM基线系统下降了4.81%。段模型重估算法耗时比基础解码大为减少,为段模型的实际应用提供了参考。 3,LVCSR系统的基础上,我们搭建了一个基于HMM/SSM的关键词检测系统,该系统以声韵母网络为核心,包含了语音分割、声韵母网络生成、关键词检测等模块。系统提供了基于单音子HMM、三音子HMM和SSM的搜索引擎供选择,在实验中,我们对比分析了三种模型的优劣。
英文摘要Acoustic model is one of key technologies in automatic speech recognition research. Stochastic Segment Model (SSM) adopts segmental distribution rather than frame-based features in HMM to represent the underlying trajectory of the observation sequence. The SSM system can obtain more perfect accuracy than HMM, but with higher computational complexity. In this paper, we have investigated some fast decoding algorithms of SSM, and applied these algorithms into LVCSR system and keyword spotting system. The main research work focused on the following three aspects: 1, Proposing parallel decoding algorithm of neighboring segments. This algorithm can decode and prune multi-segment in parallel. Since the decoder is able to make the best of model scores in a wide scope, and generate an optimal pruning threshold, more non-matched models are pruned. When multistage pruning is combined into this algorithm, approximately 50% decoding time is saved without obvious influence on the recognition accuracy. 2, Due to the restriction of high computational complexity, SSM mainly plays a subordinate role in LVCSR system. Firstly, SSM is used to search the lattice generated by mono-phone HMM, the node and arc information in lattice are used to update the expanding set of SSM. Secondly, SSM is applied to rescore the N-best formed by tri-phone HMM. We propose three rescoring methods: fixed boundary score, local maximum score, and global optimum score, which obtain about 4.81% relative accuracy improvement to HMM almost without time consumption. 3, We construct a Keyword Spotting (KWS) system based on HMM and SSM. This system takes syllable initial/final network as the core, and includes speech partition, network generation and keyword spotting modules. In application, the system provides three search engine: mono-phone HMM, tri-phone HMM and SSM. We also compare the performance among these models in experiments.
语种中文
其他标识符200628014628044
源URL[http://ir.ia.ac.cn/handle/173211/7478]  
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
彭守业. 随机段模型快速解码算法及其关键词检测研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2009.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。