中国科学院机构知识库网格系统: 听觉计算模型及其在鲁棒性语音识别中的应用

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

听觉计算模型及其在鲁棒性语音识别中的应用

文献类型：学位论文


作者	卢绪刚
学位类别	工学博士
答辩日期	1999-06-01
授予单位	中国科学院自动化研究所
授予地点	中国科学院自动化研究所
导师	马颂德 ; 陈道文
关键词	听觉计算模型语音信号特征提取鲁棒性 HMM 动态非线性前向掩蔽效应时间机理部位机理尺度谱分析侧抑制 Computational Auditory Model Speech Feature's Extraction Robust HMM Dynamic Nonlinear Forward Masking Effect Temporal Mechanism
其他题名	Computational Auditory Model and Its Application in Robust Speech Signal Processing and Recognition
学位专业	模式识别与智能系统
中文摘要	传统基于FFI谱特征提取方法在HMM框架下已经取得了突破性进展，在实验室环境中和某些特定领域里已经达到很高的识别率。但是在实际的应用中，由于训练环境和识别环境的不匹配，使语音识别系统的性能大大下降。因此寻找鲁棒性的语音新特征是解决语音识别技术产品化的迫切要求，其中基于听觉感知机理的语音信号特征提取方法，是解决这一问题的根本途径。本文的研究的目的就是基于听觉感知机理的方法，提取基于听觉感知机理的特征，提高语音识别系统的鲁棒性。本文首先研究了听觉感知心理学和生理学的研究成果和发展动态，找出可以利用的某些听觉感知机理，然后结合传统的谱特征提取方法，建立了一个基本听觉模型，该模型包括对外耳、中耳的模拟，耳蜗基膜的模拟，内毛突触的模拟，听觉神经纤维的模拟及听觉中枢的模拟。经过该模型处理，最后得到的听觉神经纤维平均发放率作为表达语音信号的特征。本文提出听觉倒谱特征提取的方法，既在听觉中枢处理该模块的能量输出上进行Log压缩后，通过DCT变换得到听觉倒谱特征矢量AFCC(Auditory Frequency Cepstral Coefficient)。我们试验了基本模型AFCC／HMM鲁棒性与耳蜗滤波器频率覆盖范围的关系，试验了听觉频率通道数对鲁棒性的影响。对比 AFCC／HMM和MFCC／HMM的试验表明， AFCC／HMM的鲁棒性要好。本文提出用低通滤波器来模拟听觉中枢长时整合机制的模块。语音信号经过听觉外围系统的处理，形成基本的初级听觉谱后，听觉中枢在处理该初级听觉谱的过程时，有许多长时整合机制，用数字信号处理的观点看相当于低通滤波作用。在基本听觉计算模型上加入该低通滤波模块的试验表明，提取的特征的鲁棒性比不加入该低通处理模块提取的特征的鲁棒性得到了大大的提高。本文提出在增益因子上进行适应的动态非线性适应模块。语音信号的动态特性，被大多数人认为是语音信号感知鲁棒性的重要因素，听觉系统对语音信号的动态变化特性也比较敏感。所以提取语音信号的动态特征是增强鲁棒性的一个好方法。在研究听觉的前向掩蔽效应基础上，该模块能够很好地模拟听觉心理学的前向掩蔽效应的试验结果。该模块在语音识别鲁棒性中的应用还需要进一步的研究工作。本文提出在部位机理的基础上，整合时间机理机制，建立一个由时间机理控制的增益模块。一般听觉计算模型的研究将时间机理和部位机理分离开来进行研究，而实际上听觉系统编码机制中，既有
英文摘要	It has been gotten dramatic improvement in speech recognition technology based on FFT spectral analysis and HMM, the recognition rate is very high in laboratory condition and some certain domain. But in real application, because of the unmatchness between training and testing condition, the performance of speech recognition system will decrease sharply. It is necessary to find new robust feature to solve this problem, among all alternative methods, it is a key method to solve this problem based on auditory perception mechanisms. In this dissertation ,we mainly focus on new feature's extraction based on auditory perception mechanisms, thus this new feature can improve the robustness of speech recognition system. In this dissertation, the prospect and results from psychology and physiology of auditory perception are first discussed in detail, then many mechanisms of auditory perception are integrated into feature's extraction model, a fundamental model is designed to process speech signal. This model is made up of many modules, including the outer and middle ear, basilar membrane, inner hair cell and synapse, auditory neural fibers, and central auditory module. The output of this model is normalized to get the feature vector. A new feature's representation -auditory frequency cepstral coefficient(AFCC) is proposed. For the HMM frame, after getting the energy output from basic auditory model, a log compression operator is used ,then auditory frequency cepstral coefficient is gotten by DCT transform. Based on this basic AFCC/HMM, the relationship between the frequency range and robustness and the relationship between the frequency channel numbers and the robustness is discussed. A low pass filter is proposed to simulated the long temporal effect of central auditory mechanism. After getting the primary auditory spectrum based on fundamental auditory model, because in central auditory stage, many temporal integration mechanisms are used to get long temporal information, in digital signal processing view, it can be simulated by low pass filtering. So a low pass filter is used to simulate this mechanism. Experiment by integrating this low pass filtering module shows that, the robustness of the new feature increases with the decreasing of the end frequency of low pass filter. A nonlinear dynamic adaptive model is proposed. Dynamic feature of speech signal is thought as a very robust factor in speech perception, also our auditory system is sensitive to the dynamic changes of speech stimulation. So, it should be a good method to add dynamic feature to improve the robustness of speech recognition. Based on the research of forward masking effect, the adaptive model is proposed. This model can simulate the forward masking effect. But when this module is used in speech recognition, it can not get good robustness. A model of integrating place and temporal mechanisms is proposed. Gener
语种	中文
其他标识符	543
源URL	[http://ir.ia.ac.cn/handle/173211/5698]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	卢绪刚. 听觉计算模型及其在鲁棒性语音识别中的应用[D]. 中国科学院自动化研究所. 中国科学院自动化研究所. 1999.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。