中国科学院机构知识库网格系统: 基于高分辨率低方差谱估计的语音增强方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于高分辨率低方差谱估计的语音增强方法研究

文献类型：学位论文


作者	郑成诗
学位类别	博士
答辩日期	2009-05-27
授予单位	中国科学院声学研究所
授予地点	声学研究所
关键词	语音增强广义旁瓣抵消自适应零限形成算法谱减法功率谱估计
其他题名	Speech Enhancement Based on High Resolution and Low Variance Spectral Estimation
学位专业	声学
中文摘要	语音增强是语音通信系统中的关键技术，多通道语音增强和单通道语音增强都已经研究多年，近年也得到了应用和推广。但是，不论是多通道还是单通道语音增强算法，都有其固有的问题：多通道语音增强算法对传声器失配、混响以及导向矢量估计错误等问题非常敏感，这些因素会导致语音失真增大，而且干扰噪声无法被有效抑制；单通道语音增强算法一般只处理较平稳的噪声，而且常用的谱减法，还存在严重的“音乐噪声”问题。语音增强算法的问题限制了它的应用，只有解决这些问题，语音增强技术才能得到更为广泛的应用。针对这些问题，本文展开了深入的研究，寻求解决这些问题的有效方法。本文的主要研究内容和创新点包括以下三个方面： (1) 从理论上分析了传声器失配对双传声器语音增强算法的影响，并通过实际实验验证了理论分析的结论，为校准传声器提供了依据。针对多通道语音增强算法对传声器失配特别敏感的问题，本文详细分析了传声器失配对GSC算法和ANF算法的影响。尽管本文只分析了双传声器的情况，但是，本文的分析方法可以直接推广到任意多个传声器的情况，并为多传声器的校准提供依据。 (2) 利用归纳法首次提出了鲁棒MVDR算法，为分析比较三种非参数化的MSC方法提供了理论依据。本文提出的鲁棒MVDR算法，即α-MVDR算法，首次将三种非参数化的相干函数分析(MSC)方法三种方法统一于一个简单的形式。α-MVDR算法还被应用于频域功率谱估计和空间谱估计。在空间谱估计的应用中，结合α-MVDR算法与一种自动对角加载技术 (GLC)，得到了α-GLC-MVDR，实验结果表明，当导向矢量估计存在偏差时，α-GLC-MVDR比GLC-MVDR更为鲁棒，性能更为优越。α-MVDR算法目前已引起国内外专家学者的注意，并被德国斯普林格(Springer-Verlag)出版公司的信号处理专题丛书所引用。 (3) 提出了两种适合于谱减法的低方差功率谱估计方法，包括自适应平滑周期图方法(AAP)和基于语音倒谱系数均值的方法(MVSC)。其中，AAP方法完全基于噪声功率谱结构特征；而MVSC方法则完全基于语音倒谱系数特性。由于谱估计方差是造成谱减法“音乐噪声”的根本原因，因此，AAP方法和MVSC方法很好的解决了“音乐噪声”问题。仿真实验和实际的系统应用都验证了本文的理论分析，也验证了本文提出的算法优于传统的方法。
英文摘要	Speech Enhancement (SE) algorithm is one of the most important techniques in the speech communication system, and has been widely used nowadays. However, lots of substantial shortcomings still exist in the SE algorithm. The multi-channel SE algorithms are often sensitive to errors in the assumed signal model, such as microphone mismatch, reverberation, and the case of steering vector errors. The single-channel SE algorithms can only suppress the stationary noise, and often suffer from the annoying musical noise problem. All of these problems restrict the application of the SE algorithm in the real world; thus, it is necessary to solve them immediately. In this thesis, we give a new insight into these algorithms and try to find effective ways to solve their shortcomings. The main contents and contributions of the thesis are summarized as follows: 1) The thesis evaluates the performances of the Generalized Sidelobe Canceller (GSC) and the Adaptive Null-Forming (ANF) techniques in the presence of microphone mismatch, where the analysis provides a theoretical basis for calibrating the microphone array. Although only the two-microphone case is considered in this thesis, the analysis methods can be extended to any number of microphones. 2) Three seemingly disparate non-parametric Magnitude Squared Coherence (MSC) estimation methods, including Welch’s averaged periodogram, the Minimum Variance Distortionless Response (MVDR), and the Canonical Correlation Analysis (CCA) methods, are treated in a unified way. The relationship brings out a new class of MSC estimators in terms of non-linear functions of the covariance matrix, which is referred as α-MVDR. The α-MVDR algorithm makes us simpler to understand the three MSC estimators and their properties. At the end, the α-MVDR algorithm is applied to the spatial power spectrum estimation. The α-MVDR algorithm is combined with the Generalized Linear Combination (GLC) method (will be referred as GLC-MVDR), numerous simulation results show that the proposed α-GLC-MVDR algorithm provides a better performance than the GLC-MVDR in the case of steering vector errors. 3) The thesis gives a new insight into the musical noise problem and points out that the musical noise is mainly due to the large variance of the periodogram. To reduce the variance of the periodogram, two algorithms are proposed. One is the Adaptive Averaging Periodogram (AAP) algorithm, which is based on the characteristics of the noise Power Spectral Density (PSD). The other is the Mean Values of Speech Cepstra (MVSC) algorithm, which is totally based on the characteristics of speech cepstra. The thesis reveals that the speech cepstra are Gaussian distributions with unknown means and known variances under Gaussian assumption of speech, and most of the speech cepstra are close to zero. Based on the two characteristics, two algorithms are proposed to estimate the MVSC. One is the novel Cepstral Subtraction Method (CSM), the other is the Modified Cepstrum Thresholding (MCT) algorithm. Simulation results show the better performances of the AAP and the MVSC algorithms. Numerous simulation results and the application of the proposed algorithms in the real speech communication systems verify our theoretical analysis and show the promising performances.
语种	中文
公开日期	2011-05-07
页码	129
源URL	[http://159.226.59.140/handle/311008/444]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	郑成诗. 基于高分辨率低方差谱估计的语音增强方法研究[D]. 声学研究所. 中国科学院声学研究所. 2009.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。