中国科学院机构知识库网格系统: 基于贝叶斯网络的文本无关说话人识别研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于贝叶斯网络的文本无关说话人识别研究

文献类型：学位论文


作者	万洪杰
学位类别	博士
答辩日期	2005
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	说话人识别文本无关贝叶斯网络 MFCC特征基频特征
其他题名	Research on Text-independent Speaker Recognition based on Bayesian Network Theory
中文摘要	文本无关的说话人识别由于不限制话语内容，不存在被测说话人合作态度问题，数据的获取相对容易，具有很宽的实际应用范围。本文将贝叶斯网络的理论与说话人识别的技术相结合，提出文本无关的说话人识别的新的方法，进一步提高辨识率。论文的主要贡献如下：1）贝叶斯网络理论给出了相联系的事件之间简洁自然的概率分布关系和基于概率分布关系的学习算法与推理算法。在此基础上，可以通过观察现象来推测某一事件发生的概率。本文将贝叶斯网络理论用于文本无关的说话人识别，提出了贝叶斯网络说话人识别方法，给出了系统的训练算法和识别算法。2）对贝叶斯网络说话人识别方法进行了可用性研究，提出了用MFCC系数作为说话人声学特征、网络隐结点作为推理的隐含依据的系统架构方案和训练方法与识别方法。实验研究了贝叶斯网络说话人身份识别方法的辨识性能，并与混合高斯模型（GMM）方法进行了对比。实验表明，贝叶斯网络说话人识别方法显著优于GMM方法。3）基音频率是说话人的一个重要特征，但是仅仅利用基频特征来进行说话人识别只有在系统用户比较少的情况才有效，当系统用户较多时，识别性能下降很多。本文在实验观察的基础上，提出了融合MFCC特征和基频特征的贝叶斯网络说话人身份识别方法，给出了联系MFCC、基音频率和说话人身份三者的贝叶斯网络结构和训练方法与识别方法。实验表明，在贝叶斯网络说话人识别方法的架构中，MFCC特征与基频特征的融合，在较短的语音数据训练下获得较高的辨认正确率。4）提出了融合MFCC特征和基频特征的贝叶斯网络说话人性别识别方法，给出了联系MFCC、基音频率和说话人性别三者的贝叶斯网络结构和训练方法与识别方法。实验表明，融合MFCC特征和基频特征的贝叶斯网络说话人性别识别方法具有优异的性别辨识能力，在1秒钟侧试语句下可以达到99％的辨识正确率。
英文摘要	In Text-independent Speaker recognition, the content of spoken language is not limited. There doesn't exist problem of cooperation.Data can be collected more convenient.Thus it can be used widely.This thesis combines bayesian network theory and speaker recognition,presents new text-independent speaker recognition method.Experimental results show that this method can achieve more higher performance. The main contributions of this thesis are: Bayesian network describes natural and tidy probability relation between events, and it has inference and learning algorithm based on probability distribution. On this basis, probability of unknown events can infer from observed events. This thesis uses bayesian network theory for text-independent speaker recognition, gives parameter estimation method and recognition method. Research on the usefulness of bayesian network is carried out. A system which uses MFCC parameter as acoustic feature, hidden node as hidden inference basis for speaker recognition is presented. Experiments were carried out on identification, comparison between gaussian mixture model is also carried out. Results show that this method has high performance than gaussian mixture model. 3) Pitch frequency is an important feature for speaker recognition, and it has been proved that it is valid for speaker recognition. But only use pitch for speaker recognition only suits for speaker recognition systems which have few users. When there are many users,the performance will be very bad.Based on experiments,this thesis present a bayesian network which models the relation between mfcc,pitch and the identity of speakers for speaker recognition. Method of how to use pitch in the network, parameter estimation of the network and model training for speaker recognition are given in detail. Experimental results show that this method can achieve high speaker recognition performance using short time training data. 4) This thesis present a bayesian network structure for gender classification. This network models the relation between gender, pitch and mfcc parameter. Method of prameter estimation and method for training the model for gender classification are given in detail. Experimental results show that this method is very efficient for gender classification. When testing with one second per sentence, classification performance can achieve 99%.
语种	中文
公开日期	2011-05-07
页码	97
源URL	[http://159.226.59.140/handle/311008/920]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	万洪杰. 基于贝叶斯网络的文本无关说话人识别研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。