中国科学院机构知识库网格系统: 基于语音数据异质性信息处理的声学建模研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于语音数据异质性信息处理的声学建模研究

文献类型：学位论文


作者	丁鹏
学位类别	工学博士
答辩日期	2003-10-01
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波
关键词	大词汇量连续语音识别声学模型语音数据分类与建模隐马尔可夫模型说话人自适应训练方差建模 largc vocabulary continuous automatic specch rccognition acoustic modeling speech data classification and modeling Hidden Markov
其他题名	On the Processing of Extraneous Acoustic Variations for Acoustic Modeling in Speech Recognition
学位专业	模式识别与智能系统
中文摘要	语音信号中包含丰富的信息，除文本内容外还包括很多与语音识别任务无关的部分。这些信息的典型来源包括说话人性别、年龄、情绪、说话风格、背景噪声、传输信道等因素，在本文中被统一定义为异质性信息。对语音数据异质性信息的处理正日趋成为当前语音识别界研究的重点之一，其原因有二：首先，异质性因素的存在将会导致声学模型参数的非线性畸变，从而造成误识率的升高和模型推广能力的下降；其次，随着语音识别技术的发展，越来越多的异质性相对明显的现场数据参与到了声学模型的训练过程中。因此无论从研究的必要性还是从紧迫性方面来看，数据异质性信息的处理问题都亟待解决。认识问题的过程是分析和综合的统一。作为全文立论的基础之一，本文对当前主流语音识别系统的框架结构和作为本文主要研究对象的声学模型的训练准则进行了全面综述，一方面界定本文的研究目的和意义，另一方面指出区分度训练对于解决语音数据异质性问题大有裨益。作为全文立论的基础之二，我们多方面分析了异质性信息对语音数据分类和建模可能造成的影响。在这些基础之上，本文对目前可行的解决方法做了全面的总结，并依据算法的实质将它们分为多套模型分而治之、消除、描述和利用四大类。分析的目的是为了综合，并提出解决方案。为此我们分别进一步研究了多套模型分而治之、消除、描述几种策略，并遵循“在变化中寻找相对不变性，并为之建模”的指导思想，提出了一种协同消除和描述策略的解决方案。本文针对语音数据中蕴含的异质性信息问题进行了深入的研究，涉及到了许多语音识别的基本问题，主要的工作和贡献有：．提出了一种基于非语境因素扩展决策树技术和输出分布覆盖度测度的分析方法，这种方法不但可以定性反映异质性因素对于语音数据分类和建模可能造成的影响，而且在一定程度上进行定量描述；．提出了一种基于最大似然增益的模型组合算法，该算法一方面可以避免由于数据分类建模而可能带来的训练数据稀疏问题，同时包含一套灵活的模型组合和选择机制；．发展了一种推广的特征空间说话人自适应训练算法，将说话人自适应训练对于数据异质性的消除能力推广到了声学模型的参数共享机制中，从而进一步提高了模型的推广能力：．将因素分析方差建模技术应用到说话人识别系统中，并研究了在训练数据严重稀疏条件下的多种模型参数共享技术，显著提高了说话人识别系统的性能。借助因素分
英文摘要	There are many kinds of unwanted variabilities other than phonetic variations contained in speech signals,which might caused by different speakers,speaking styles,channel or acoustic environments.Since those variations existed in the data are not directly related to the modeling purpose,in this paper;they are named as extraneous acoustic variations. The high necessities to deal with the extraneous acoustic variations are two folds.For one thing,in the training phase,the estimated model may be diverged substantially by also modeling extraneous variations.For the second thing)in recent years there has been a trend towards using found training data,in which greater variability call be seen than the specially collected data,to build speech recognition systems. To solve the problem,in this thesis,we first presented a comprehensive analysis on the effects of extraneous acoustic variations may impose on the classification and modeling of speech data.Then we briefly review and categorize the possible solutions into divide and conquer,removal,modeling and utilization four broad classes.After having taken the divide and conquer,removal,modeling strategies into further studies,we propose a synergy which combines feature space speaker adaptive training and semi-tied covariance model to deal with the problem. In this thesis,we presented a comprehensive study on the processing of extraneous acoustic variations.The efforts are followings: ·We propose a simple but convenient analysis scheme to explicitly show to what extent the impact of various sources of extraneous variations may imposed on the classification and modeling of speech data by the use of general features decision tree(GFDT)and Output coverage measure. ·A new framework based on GFDT named as maximum likelihood model combination is introduced.By using this framework,the training data sparse problem brought by training data splitting commonly employed in the divide and conquer strategy can be alleviated and a flexible model selection procedure is advanced. ·We extends the feature space speaker adaptive training (FSAT)scheme to normalize the effects of those variabilities in phonetic decision tree construetion process,thus to improve the generalization ability of acoustic modeling. ·Applying the Factor Analysis model into the speaker recognition system to deal with the contradiction between the sparse training data and the need of spatial correlation modeling.Various parameter tying strategies were studied to further improve the system performance.Moreover,tentative analysis of the importance of the cepstral coefficients for the speaker identityrelated information were also presented. ·In this thesis,we also studied the Semi-tied Covariance(STCL model to alleviate the problem of week spatial correlation modeling.Three techniques,including the joint optimization of both transformation and HMM
语种	中文
其他标识符	799
源URL	[http://ir.ia.ac.cn/handle/173211/5788]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	丁鹏. 基于语音数据异质性信息处理的声学建模研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2003.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。