传声器阵列技术及其在语音识别系统中的应用
文献类型:学位论文
作者 | 张恒 |
学位类别 | 博士 |
答辩日期 | 2009-05-23 |
授予单位 | 中国科学院声学研究所 |
授予地点 | 声学研究所 |
关键词 | 传声器阵列 语音识别 噪声鲁棒性 语音活动性检测 自适应回波抵消 |
其他题名 | Microphone Array Techniques and Applications in Automatic Speech Recognition Systems |
学位专业 | 信号与信息处理 |
中文摘要 | 随着计算机的便携化以及随身数码设备功能的多元化,人们越来越迫切的要求摆脱传统输入设备(键盘、鼠标等)的束缚,代之以更加便于使用的、自然的、人性化的输入方式。语音信号处理的应用环境也随之越来越复杂。随着人们对自然、高效的人机交互方式的渴求,语音技术的大规模实用化成为越来越紧迫的任务,而语音识别在近年来的快速发展也使得这一切成为可能。但是,实际环境的复杂多变对语音识别系统的鲁棒性提出了很大的挑战。如何减弱或去除噪声对语音识别系统的负面影响成为了研究的热点之一。 传统的单通道算法由于只能利用时/频域信息,大多只对平稳或准平稳噪声有一定的抑制和消除作用。而当非平稳噪声存在或信噪比较低时,该类算法往往不能作出有效的贡献。另外,有研究显示,在大多数情形下,单通道噪声消除算法并不能提高语音识别率。而传声器阵列技术与单通道方法相比,其优势在于除了时、频域信息外,还能提供空间上的区分度。包含传声器阵列的语音识别器,在很多应用场合得到了积极的结果。 典型的实用语音识别系统大致可分为语音信号采集、语音活动性检测、特征提取及解码等步骤,同时还可能需要对声学回波具有抑制作用。而传声器阵列技术可以在以上很多环节中起到积极的作用。本文将深入研究基于传声器阵列的语音识别系统的各个组成部分,分析其原理及关键技术,致力于将传声器阵列技术与语音识别系统紧密地结合起来,尽可能发挥传声器阵列在语音识别系统中的作用,主要研究工作及创新点包括: 1. 研究了波束和谷点形成算法,以及传声器阵列后滤波算法,提出了基于听觉感知子带的频域自适应谷点形成算法,及其与后滤波算法的融合系统。该算法使用较小的阵列孔径及较少的阵元个数,收到了较强噪声抑制能力,并很好地保持了输出语音的质量。 2. 研究了传声器阵列与自适应回波抵消的结合方法,开发出一套适用于车载平台的传声器阵列语音前端系统,在实际场景中,大幅提高了信噪比和语音识别率。 3. 提出了一种基于信号波达角同一性的用于语音活动性检测(VAD)的特征,并以此特征为基础,构建了VAD算法。该算法对非方向性噪声和目标区域外的方向性噪声、甚至是高强度干扰语音有较强的区分能力,弥补了传统单通道VAD算法的不足。 4. 提出了一种服务于语音识别的基于传声器阵列的特征增强系统,将噪声消除引入至MFCC特征域。该算法不需要关于噪声和声场的先验知识,在非平稳噪声存在的情况下, 提高了识别率。 |
英文摘要 | Portable devices and handheld digital equipments have nowadays been widely used, which leads to a growing demand for a more convenient human-machine interface to be applied. Speech, which is considered to be the most natural and most important means of communication between human beings, becomes the first choice. And the fast development of speech recognition technologies in the nearest decade makes it a probable task. As the circumstance in which the speech-based systems varies, the robustness of speech technologies against adverse environment becomes an important issue. Single channel approaches of speech preprocessing employ only the information from time and frequency domain, and are often less efficient while dealing with non-stationary inferences and low signal-to-noise (SNR) ratio. Some researches pointed out that single channel methods contribute little to speech recognition rate. Microphone array techniques used in speech related areas are drawing more and more attention. Compared to single channel methods, array-based algorithms deploy the information spatially, which increases their capability. And recognition systems using a microphone array report some positive outcomes. The preprocessings of a typical speech recognizer can be categorized into stages such as speech signal acquisition, voice activity detection, feature extraction, etc. And sometimes the systems must resist the influence brought by acoustic echo including far-end speech and playbacks. In this thesis, different stages of the microphone array speech recognition system are studied thoroughly with the idea to make full use of multi-channel information to increase the performance of the system. The main contributions of this thesis include: 1. Beamforming, as well as null-forming and postfiltering techniques are carefully reviewed, based on what an auditory subband-based adaptive null- forming algorithm in frequency domain is approached, and a joint system of spatial filtering and postfiltering is constructed, which achieves satisfactory noise suppression and speech maintenance ability with comparatively small array aperture and few microphones. 2. The ways to unify the acoustic echo cancelation (AEC) and the microphone array systems are studied, which leads to a vehicle-borne speech recognition front end including both techniques. Great improvement on SNR and recognition rate is observed. 3. A DOA homogeneity-based VAD feature is proposed, based on which a VAD algorithm is designed. The algorithm is capable of distinguishing desired speech from directional/non-directional interferences (including even competing speakers) with considerable robustness. 4. A feature domain array-based noise suppression method is approached. This method employs no a priori knowledge of interference or noise field and increases the recognition rate while non-stationary interferences are present. |
语种 | 中文 |
公开日期 | 2011-05-07 |
页码 | 135 |
源URL | [http://159.226.59.140/handle/311008/524] ![]() |
专题 | 声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文 |
推荐引用方式 GB/T 7714 | 张恒. 传声器阵列技术及其在语音识别系统中的应用[D]. 声学研究所. 中国科学院声学研究所. 2009. |
入库方式: OAI收割
来源:声学研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。