语音信号的盲分离算法研究
文献类型:学位论文
作者 | 胡亚龙 |
学位类别 | 博士 |
答辩日期 | 2005 |
授予单位 | 中国科学院声学研究所 |
授予地点 | 中国科学院声学研究所 |
关键词 | 盲源分离 最大熵 高斯混合模式 非线性 频域 互相关 |
中文摘要 | 语音是人类相互之间进行交流最自然最方便的形式之一,在复杂的语音环境下,人可以选择听觉注意力,轻而易举的获取自己感兴趣的语音内容,表现出极强的自适应能力。这就是鸡尾酒会效应,由此建立数学模型,提出盲源分离问题。盲源分离是指在不知源信号和传输信道参数的情况下,根据输入源信号的统计特性,仅由观测信号恢复出源信号的各个独立成分的过程。该问题从提出到现在,总共不到20年时间,已经成为当前信号处理领域的一个研究热点。随着计算机的运算速度越来越快,该项技术可以广泛的应用在语音信号处理、图像信号处理、通讯信号处理、水声信号处理以及医学信号的检测、数据挖掘等领域,可见信号盲分离具有广阔的应用前景。本文首先介绍了盲分离算法研究现状。由于信息论、统计论是该算法的理论基础,因而,从信息量,嫡以及高阶统计量的角度介绍了线性盲分离、盲反卷积以及非线性盲分离的基本算法。传统最大熵盲分离算法,是以最大化分离输出系统的嫡作为分离算法的代价函数。通过分析最大墒算法和最小互信息(枷工)算法之间的联系,提出扩展最大墒算法(EME),该算法的思想就是估计输出语音信号的概率密度函数,因而引入高斯概率密度估计的方法来代替对数化概率密度估计输出语音信号的概率密度函数。该估计算法更加逼近源信号,可以有效的分离线性瞬时以及卷积混合的语音信号。接着针对更广泛的非线性混合语音,以F工R非线性混合为模型,基于扩展最大熵算法,以高斯概率密度估计的方法来估计输出语音信号的概率密度函数,采用最大期望(EM)迭代算法推导了分离算法的权向量迭代公式。通过模拟仿真实验,实验结果与传统的线性最大墒方法比较,新算法提高了收敛速度,并有效的完成了非线性语音分离任务,抑制了干扰语音信号的影响,提高了输出信噪比。由于信号处理也可以在频域中进行,因而本文最后,提出了一种基于去相关的频域盲分离方法,该方法以最小二乘法推导了前向后向模式预测矩阵A,W,然后根据W,来估计源信号S,并且分析了频域盲分离算法存在的交换,虚假解等模糊性问题,针对语音信号的连续性和独立性特点,提出了以功率谱和包络相关性为约束条件,从而解决这些问题,通过模拟实验,可以看出,该方法提供了一条解决频域盲分离问题的可行方案。 |
英文摘要 | Speech is one of the most natural and convenient communication channels among human beings. Even in the noisy environment or a loud party, a person can choose what he/she wants to hear and ignore other unwanted sounds. This is called the cocktail party problem or Blind Source Separation (BSS) problem. BSS deals with the problem of extracting sources from a mix of unknown independent sources. It is an interesting and widely studied topic in signal processing. Research on BSS has been going on for nearly twenty years. The BSS techniques have been applied to a variety of signal and image processing areas such as speech, image, communications, hydroacoustic, biomedical, nondestructive evaluation, data mining, and many others. In this dissertation, we first describe the current status of BSS and its theoretic fundamentals. Next several important BSS methods, including linear BSS, blind deconvolution, and nonlinear BSS, are introduced. The relationship between the Maximum Entropy (ME) and the Minimum Mutual Information (MMI) algorithms is also discussed. In this dissertation, an extended ME algorithm based on the ME and MMI algorithms is proposed to cope with the BSS problem. This approach employs the Gaussian Mixture Model (GMM) probability density function (pdf) estimation of the output speech signal. Results show that it can separate linear-convolution input mixture model. Further, we propose another speech separation batch processing algorithm based on the FIR nonlinear mixture model. This is an iterative algorithm using the Expectation Maximization (EM) method. Simulation results show that the proposed algorithm can separate sources effectively, and exhibits good convergence and robustness. In addition, signal-to-noise ratios of extracted speeches are improved. Finally, we propose a de-correlation approach to tackle the BSS problem in the frequency-domain. Based on the cross-power spectrum, we use the least squares optimization technique to estimate the forward model and the backward model. We then employ the estimated backward model matrix to compute the estimates of independent signal sources. Moreover, the problems of permutation and pseudo-solutions occurred in the frequency-domain BSS are investigated. A new iteration constraint is proposed to resolve these problems. |
语种 | 中文 |
公开日期 | 2011-05-07 |
页码 | 78 |
源URL | [http://159.226.59.140/handle/311008/936] ![]() |
专题 | 声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文 |
推荐引用方式 GB/T 7714 | 胡亚龙. 语音信号的盲分离算法研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005. |
入库方式: OAI收割
来源:声学研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。