中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
基于听觉感知的汉语语音可懂度研究

文献类型:学位论文

作者杨琳
学位类别博士
答辩日期2008-05-24
授予单位中国科学院声学研究所
授予地点声学研究所
关键词听觉感知 语音可懂度 时域包络 精细结构 心理声学感知实验
其他题名Research on Mandarin Speech Intelligibility Based on Auditory Perception
学位专业信号与信息处理
中文摘要在多数情况下,语音信号处理是为人的听觉服务的。在各种语音产生和传输环境下,为了提高输出语音的可懂度和质量,对语音信号的研究应尽可能结合人耳的听觉系统结构以符合人耳对语音信号的感知特性,因此对语音问题的研究都要与语音知觉过程联系起来,语音信号听觉感知研究已经成为现代语音学的一个重要分支。汉语作为当前世界上使用最广泛的语言之一有其特殊性,在感知层次上它与西方语言存在着较大差异,因此针对汉语语音进行感知研究将有助于中文语音处理技术的发展。 本文重点研究了汉语语音的感知特性,从时域分析的角度考察了包络和精细结构对汉语语音可懂度的影响,在已有实验的基础上,改进了人工耳蜗连续交迭采样算法,使之更加适合母语为汉语的耳聋患者;并针对汉语语音的特点改进了语音可懂度客观评价指数——语音传输指数STI,使之能够更好地预测汉语语音的可懂度。本文主要工作和贡献如下: 1.在前人对精细结构研究的基础上,采用人工耳蜗连续交迭采样模拟算法,研究不同噪声情况下,各频带精细结构信息对于汉语元音、辅音、句子和声调可懂度的影响。心理声学主观感知实验的结果表明:精细结构可以增强元音和声调对噪声的鲁棒性,但不能增强辅音对噪声的鲁棒性;400-1000 Hz的精细结构对安静环境下的元音、辅音和句子识别有显著影响。 2.汉语是一种声调语言,采用“听觉嵌合体”合成算法,考察了时域包络和不同频带精细结构对声调感知的影响。心理声学感知实验表明:低频精细结构对声调识别有重要作用,高频精细结构对声调识别作用不显著;单纯依靠包络信息,也可以达到一半以上的声调识别率;与纯音和复合音的基频感知相似,汉语的声调感知也存在一个主要区域,即:2~5次谐波对声调感知的作用比基频更大。 3.目前的人工耳蜗语音处理算法都是针对西方语言设计的,在对汉语语音感知研究的基础上发现精细结构对汉语可懂度有重要贡献,因此,提出一种改进的人工耳蜗连续交迭采样算法,引入精细结构信息;对具有正常听力的被试进行人工耳蜗声学模拟实验,结果表明,改进的算法可以大大提高被试对元音和声调的识别能力,从而有望提高耳聋患者的语音可懂度。 4.在语音可懂度的客观评价方面,目前已成为国际标准的语音传输指数已经被证实对多种西方语言有良好效果,对于汉语这种特殊的声调语言还有待近一步验证。我们根据语音传输指数的包络调制理论,对汉语的调制谱和调制转移函数进行了系统分析,并且提出一种改进的语音传输指数算法,在新的算法中包含了特定语言的信息,并通过主观实验验证了40种噪声和回响条件下主、客观可懂度评价的一致性。实验结果表明,改进的算法比传统算法更适用汉语普通话的客观可懂度评价。
英文摘要The aim of speech processing is to serve a good hearing for ears, so research on the speech signal processing should be based on the acoustic characteristics of the human auditory system. So far more and more attention has been paid to the study of speech perception and some important achievements have been made on the research of western language speech perception. However as one of the most popular languages in the world Chinese has a lot of differences from western languages perceptually. Therefore the research to Chinese speech perception will be contributable to Chinese speech signal processing. This dissertation makes a detailed study on perceptive characteristics of Mandarin speech by analyzing the relative contributions of temporal envelop and fine structure information in different frequency bands to Mandarin speech intelligibility; especially the tone perception through a novel “Auditory Chimaeras”. Based on the research conclusions, an improved signal processing algorithm of cochlear implant is proposed and psychoacoustical experiments are conducted to testify the new strategy. We also make an improvement on the traditional Speech Transmission Index (STI) to evaluate the speech intelligibility objectively. The dissertation contains the following works and contributions: 1.Based on the previous studies, we adopt the Continuous Interleaved Sampling (CIS) algorithm in cochlear implant to make a detailed and systematic analysis on the relative contributions of temporal envelop and fine structure in different frequency bands to Mandarin vowel, consonant, tone and sentence recognition. Results from the psychoacoustic experiment show that the temporal fine structure can make the vowel and tone perception more resistant to noise than the temporal envelop cues; the temporal fine structure cues ranged from 400 to 1000 Hz can improve the vowel and sentence recognition in quiet. 2.Tone is an important cue for Mandarin speech perception, so we systematically evaluate the relative contributions of temporal fine structure in different frequency bands by a novel “Auditory Chimaeras”. Our results confirm the importance of temporal fine structure cues to lexical tone perception and a dominant region in lexical tone perception is found, namely the second to fifth harmonics can contribute no less than the fundamental frequency itself. 3.Because the temporal fine structure cues are crucial for speech perception in noise or melody appreciation for CI patients, we propose an improved CI strategy based on CIS model by introducing partial low frequency temporal fine structure or frequency modulation (FM) information into the slowly varying temporal envelops. Experimental results show that the incorporation of FM information can improve CI performance largely, especially for vowel and tone perception. 4.Speech Transmission Index (STI) has been proved to be a good physical matrix to predict speech intelligibility of most western languages. But it is not testified on such tonal language as Mandarin. By a detailed analysis of modulation spectrum and modulation transfer function for various degraded listening conditions, we make an improvement for Mandarin STI method. The subjective experiments show that the improved STI calculation is more suitable to predict the intelligibility of Mandarin speech in noisy and reverberation listening conditions compared with the traditional method.
语种中文
公开日期2011-05-07
页码111
源URL[http://159.226.59.140/handle/311008/360]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
杨琳. 基于听觉感知的汉语语音可懂度研究[D]. 声学研究所. 中国科学院声学研究所. 2008.

入库方式: OAI收割

来源:声学研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。