嵌入式非特定人连续语音识别系统研究
文献类型:学位论文
作者 | 刘斌 |
学位类别 | 博士 |
答辩日期 | 2005 |
授予单位 | 中国科学院声学研究所 |
授予地点 | 中国科学院声学研究所 |
关键词 | 语音识别 嵌入式系统 子空间高斯聚类 特征分量屏蔽 |
其他题名 | Research on Embedded Speaker-Independent Continuous Speech Recognition System |
中文摘要 | 目前语音识别逐渐开始应用于嵌入式系统,例如手机、掌上电脑等各种移动设备,但是由于受到CPU运算能力和内存资源的限制,嵌入式系统上的语音识别大多还停留在中小词汇量的语音命令和控制,而非特定人连续语音识别由于具有较高的计算复杂度尚未在嵌入式系统上得到广泛的应用。在此背景下,本论文研究基于隐含马尔可夫模型(HMM)的嵌入式非特定人连续语音识别系统的实现及其性能优化。本论文的主要工作如下:1、实验研究了现有技术平台的性能瓶颈,揭示了语音识别的工作性能与计算资源的冲突所在,其中浮点运算能力是影响嵌入式平台上非特定人连续语音识别系统实时性的关键因素,算法定点化可以大幅提高系统的识别速度,但是仍然无法达到实时的要求,同时会带来识别率的下降;2、设计并实现了一款基于MPC5200微处理器的嵌入式平台,该平台具备较强的浮点运算能力和充足的存储空间,并巨支持音频输入输出。在该平台上实现了基于HMM的非特定人连续语音识别,可以实时处理中小词汇量的连续语音识别任务;3、在嵌入式平台上实现了基于子空间高斯聚类的决速算法,进一步简化HMM模型的计算复杂度,回避了声学模型重新训练的问题,实验结果表明,系统识别速度提升20%以上,识别率基本保持不变,大大拓展了系统的词汇量规模;4、提出了一种简单有效的特征分量贡献度的衡量方法,通过选择屏蔽特征矢量的各个分量,评价其对系统识别率的贡献,在识别过程中计算HMM模型的时候,屏蔽贡献度比较低的特征分量,达到降低计算复杂度的目的,实验结果表明,该方法可以将识别速度提升5%以上,而识别率基木保持不变。 |
英文摘要 | Currently speech recognition technologies are more and more applied in embedded systems, such as mobile phone, PDA and other mobile devices. But because of the restrict of embedded systems' computing ability and memory resource, embedded speech recognition systems use middel-vocabulary and small-vocabulary voice command and control mostly. HMM-based speaker-independent continuous speech recognition algorithms have high computing complexity, and are not widely adopted in embedded systems. Based on that, this thesis studied the implementation and performance optimization of HMM-based embedded speaker-independent continuous speech recognition systems. The main contributions of this thesis are: 1-, Experimental studies on the performance bottleneck of popular embedded platforms show that float-point computing ability of embedded platforms significantly affects the real-time performance of HMM-based speaker-independent continuous speech recognition systems, fix-point conversion can improve the real-time performance obviously, but still can not meet the real-time requirement, and recognition accuracy decreases at the same time. 2^ An embedded platform based on MPC5200 microprocessor is designed and implemented. This embedded platform has powerful float-point computing ability and enough memory space, supports audio input and output, and can process middle-vocabulary and small-vocabulary continuous speech recognition tasks in real-time. 3^ A fast calculation algorithm based on subspace gaussian clustering is implemented on embedded platform to reduce the computing complexity of HMM models. Unlike traditional algorithms, this fast calculation algorithm uses gaussian clustering on feature subspace, need not retraining the acoustic models, and is also suitable for large-vocabulary continuous speech recognition tasks. The experiment results show that the computation time can be reduced more than 20%, and recognition accuracy decreases slightly. 4-, A simple and effective method is proposed to weigh the contribution ability of feature components, which evaluate the contribution of every feature component to recognition accuracy through masking feature components one by one, the computing complexity can be reduces through masking feature components with low contribution ability. The experiment results show that, the computation time decreases by more that 5% and the recognition accuracy decreases a little. |
语种 | 中文 |
公开日期 | 2011-05-07 |
页码 | 93 |
源URL | [http://159.226.59.140/handle/311008/1052] ![]() |
专题 | 声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文 |
推荐引用方式 GB/T 7714 | 刘斌. 嵌入式非特定人连续语音识别系统研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005. |
入库方式: OAI收割
来源:声学研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。