中国科学院机构知识库网格系统: 基于半波波形的变速率语音压缩算法的研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于半波波形的变速率语音压缩算法的研究

文献类型：学位论文


作者	陈小苹
学位类别	博士
答辩日期	1999
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	波形编码半波波形矢量量化多模式编码可变速率编码
中文摘要	本文提出了一种新的多模式变速率语音信号波形编码算法。该算法以语音的时域半波波形为单位进行分帧和矢量量化，不但大大提高了波形编码的压缩率，而且由于半波波形能够严格保持语音信号的过零率信息，从而也保证了重构语音的可懂度，减小了失真。该算法可以在中低速传输码率下运行并提供较好的语音质量。本文的主要贡献有以下几个方面：1. 提出了一种全新的语音信号波形编码分帧方法。矢量量比（VQ）方法之所以长期以来没有在语音信号波形编码方面得到直接应用，主要原因在于人们找不到一种合适分割语音波形的办法。本文提出的语音时域半波波形分割法具有下面两个重要特点：1）被分割出的语音半波波形大致都两头低、中间高，具有相近特征。而且，语音信号半波的波形大致可以分为单峰、双峰和多峰几种类型，更细微的区别只在于峰的宽度和陡峭程度的不同。这样，被分割出的语音信号波形自然就是聚集在一些中心周围，而非完全离散的。由于对语音分帧巧妙，在聚类分析中得到的类中心和周围属于它的样本间的误差就会相对较小，信噪比就会提高。2）以半波波形为单位进行分帧处理，是以语音信号的特性为基础的。语音信号的过零频率（过零率）代表其能量集中区，只要保证语音信号的过零率不变，语音信号的质量就会在极大程度上得以保证，而其幅度则可允许一定范围内的误差。与传统的波形量化方法主要不同点在于：本半波量化方法首先追求的是过零率的严格不变，其次才考虑幅度逼近，这正是考虑语音信号物理特性的结果。2. 实现了一套高效率语音波形编解码软件。对于以11.025kHz频率采样，16 bits量化的语音，采用本编码软件可以使浊音部分的平均码率降为2.79 bits/sample，平均压缩比约为5.73；清音部分的平均码率降为0.53 bit/sample，平均压缩比为30.19；浊音矢量量化器的平均信噪比为13.33 dB，清音标量量化器的平均信噪比为31.31 dB。编解码后语音的平均意见得分（MOS值）可达到3.7以上。3. 找到了许多适合于本编解码算法的应用领域，而且有待于在这些领域展开进一步的工作。1）正在着手准备将本压缩算法用于Internet语音通信中。2）正在进一步研究本压缩算法在语音合成系统中的应用。
英文摘要	In this paper, a novel multimode variable bit rate waveform coding approach is presented. This approach regards time-domain half-wave vector of speech waveform as quantization unit, not only improving the compression rate of waveform coding greatly but also keeping the formant that is important for understandability and decreasing the distortion of reconstructed speech due to the strict keeping of zero-crossing information of speech waveform. This approach can operate at low-medium bit rate and provide acceptable-to-good speech quality. The main contributions of this paper are as follows. 1. A novel frame-partition approach in speech waveform coding is presented. The main reason that vector quantization hasn't been directly applied to speech waveform coding for a long time is that no adaptable waveform partition approach has been found. The time-domain half-wave waveform partition approach presented in this paper is of two important characteristics as follows. 1) The half-waves separated from speech waveform are mostly of high medium and low ends. Besides, half-waves can be roughly classified into single-peak, double-peak and multi-peak, which can be furthered distinguished according to peak width or height. Therefore, speech half-wave waveforms separated are naturally gathered around some centers, not completely dispersed. The frame partition approach makes the error between the cluster center obtained in cluster analysis and the surrounding samples belonging to it relatively small, and the signal-to-noise ratio relatively high. 2) Regarding half-wave waveform as frame partition unit is based on the characteristics of speech signal. The zero-crossing frequency of speech signal corresponds to the energy. If only the zero-crossing frequency is kept unchanged, the speech quality can be assured to a large extent, which admits amplitude error to a definite region. Compared with traditional waveform quantization approach, the main difference is that the half-wave quantization approach first keep the zero-crossing frequency unchanged, and then consider amplitude approaching, which is decided by the physical characteristics of speech signal. 2. A set of high efficient speech coding-decoding software is realized. For the speech signal with sampling rate of 11.025kHz and resolution of 16 bits/sample, after using this coding software, the mean bit rate of voiced segment can be decreased to 2.79 bits/sample, the corresponding mean compression ratio can be up to 5.73. The mean bit rate of unvoiced segment can be decreased to 5.73 bits/sample, the corresponding mean compression ratio can be up to 30.19. The mean bit rate of voiced and unvoiced segment can be decreased to 2.33 bits/sample, the corresponding mean compression ratio can be up to 6.87. The mean signal-to-noise ratio (SNR) of voiced vector quantizer is 13.33 dB, and the unvoiced scale quantizer is 31.31 dB. The mean opinion score (MOS) for recovered speech is up to over 3.7. 3. Many applications applicable to the novel coding-decoding approach have been found, which needs further research on this respect. 1) Applying this compression approach to Internet speech communications. 3) Further researching the application of this compression approach in speech synthesis system.
语种	中文
公开日期	2011-05-07
页码	44
源URL	[http://159.226.59.140/handle/311008/594]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	陈小苹. 基于半波波形的变速率语音压缩算法的研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 1999.

入库方式： OAI收割

来源：声学研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。