中国科学院机构知识库网格系统: 大规模人群说话人识别关键技术研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

大规模人群说话人识别关键技术研究

文献类型：学位论文


作者	朱磊
学位类别	工程博士
答辩日期	2012-06-01
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波
关键词	说话人识别残差因子分析语种失配补偿说话人快速识别说话人搜索 Speaker recognition Residual factor analysis Languages mismatch compensation Quick speaker recognition Speaker search algorithm
其他题名	Research on Speaker Recognition With Large Scale Population
学位专业	模式识别与智能系统
中文摘要	大规模人群条件下说话人识别技术的研究面临着许多亟待解决的问题，包括信道子空间的鲁棒性问题，说话人子空间的覆盖问题，语种的无关性和处理速度的高效性等。为了提高大规模人群条件下说话人识别系统的鲁棒性和高效性，论文在因子分析算法，语种失配补充算法，说话人快速识别算法和说话人搜索算法上进行了重点研究，主要工作包括： 1：提出了基于残差因子分析的说话人识别技术。在大规模人群说话人识别系统中，训练和测试环境的失配会造成系统识别性能的急剧下降。本文深入研究了基于信道子空间和说话人子空间的识别技术，在联合因子分析的基础上，为了解决说话人子空间的覆盖问题，提出残差因子分析及其快速算法，实验证明，该算法能说话人子空间覆盖不足的情况下，有效的提高系统的性能。 2：提出了基于因子分析和得分规整的语种补偿技术。虽然基于高斯混合模型的说话人识别系统是文本无关的系统，但是在实验中我们发现，语种对其的影响仍然很大，这种影响在跨语种说话人识别中尤为突出。针对语种失配对说话人识别的影响，本文在模型层和得分规整分别提出了相应的语种补偿算法，并在此基础上，考虑到语种信息的获取问题，探讨了半监督和无监督两种语种规整算法，实验证明，该算法能极大得提高跨语种说话人识别的性能。 3：提出了基于说话人度量空间的快速识别算法。虽然基于高斯混合模型的说话人识别系统的有效性已经得到了公认，但是其缓慢的计算速度影响了其走向实用的进程。本文在高斯模型似然概率公式的基础上，定义了说话人度量空间，并引入了基于说话人度量空间的内积、夹角和归一化距离，实验证明，采用基于说话人度量空间的说话人快速识别算法，在极大得提高原有识别速度的同时，也能有效的提高说话人识别系统的性能。 4：提出了基于说话人度量空间的索引技术。为了进一步提高大规模人群条件下说话人识别的速度，本文在说话人度量空间的基础上，提出基于高维空间索引的说话人搜索算法和基于聚类的说话人搜索算法。
英文摘要	Speaker recognition with large-scale population brings many urgent problems, including channel robustness, language independency, and efficient recognition speech and so on. In order to improve the performance of speaker recognition system with large-scale population, this dissertation focuses on the algorithm of factor analysis, score normalization, linear score method and fast speaker search algorithm. 1. In the speaker recognition system with large-scale population, the mismatching between training utterance and testing utterance will lead to dramatic decline in performance. We make some investigations on the factor analysis algorithm and the joint factor analysis algorithm on Gaussian mixture model based speaker recognition, and proposed some equivalent strategy to make the system more stable. Then we induced the residual factor analysis, which can improve the system performance. 2. Although the Gaussian mixture model based speaker recognition is a text-independent system, the language still takes effects and these effects seem to be very serious in the recent NIST speaker recognition evaluation. To compensate the language effect, this dissertation proposed two compensation algorithms. The one regards the language effect as a kind of channel mismatch between the training utterance and the test utterance. Therefore, we can add some bi-lingual utterances to the training corpus, which we used to train the channel subspace to remove language at the model lever. The other algorithm compensates the language effect in the score phase using the language-based normalization, and then we discuss the semi-supervised and unsupervised language normalization. 3. Although the Gaussian mixture model based speaker recognition system has been the state-of-the-art speaker recognition system, the heavy burden of calculating the log-likelihood ratio (LLR) score seems to be a new bottle-net of the system with the large-scale population. This dissertation gives an approximation of the log-likelihood ratio, which leads to significant speedup without any loss in performance. Then we define a new speaker metric space, and introduce the distance and angel of models in the speaker metric space, which can be used as the test algorithm in the text-independent speaker verification system. 4. To improve the speed of the Gaussian mixture model based speaker recognition system with large-scale population, we take use of the speaker metric space, and then induce the high dimensional ...
语种	中文
其他标识符	200618014628058
源URL	[http://ir.ia.ac.cn/handle/173211/6469]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	朱磊. 大规模人群说话人识别关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2012.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。