中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
特定说话人的声音变换

文献类型:学位论文

作者刘昆
学位类别博士
答辩日期2007-06-01
授予单位中国科学院声学研究所
授予地点声学研究所
关键词声音变换 主要元音 稳定帧 口音分析
其他题名Research on Voice Conversion for Given Speakers
学位专业信号与信息处理
中文摘要特定说话人的声音变换就是将一个特定说话人的声音变换成另外一个特定说话人的声音。本文旨在实现和改进一个完整的特定说话人的汉语语音转换系统,在总结以往该项技术的基础上,针对语音协同发音的一些现象,提出选取每个音素稳定帧上的特征来代替整个音素的特征的方法;提出了一个新的基于汉语音素的声音变换系统;另外,提出了一个以非并行语料为训练数据的非并行声音变换系统;同时,我们还将基于音素的声音变换技术扩展到英文声音变换系统中去;最后,我们还完成了一个准实时的声音变换演示系统。本文主要工作和贡献有以下几个方面: 1、提出一种新的参数选择方法,即为每个元音音素选取稳定帧,选择稳定帧上的参数代替该元音音素的参数。该方法能够通过选取该音素中间的稳定段参数作为对应音素的参数的手段,避免以往方法中不考虑协同发音对声学特征的影响,而将音素过渡段的参数进行模型训练的问题。 2、本文还分析了口音对汉语元音音素共振峰频率的影响,研究表明:口音对于单元音[O, I, U]的第二共振峰频率 的影响较大;口音对单元音[A]的三个共振峰频率没有显著影响。 3、主要元音的选择。为了减轻基于音素的声音变换系统合成的声音频谱出现的很多不连续的现象,我们提出为每个韵母选取一个主要元音音素来代替对应的整个韵母部分,这样可以将语音频谱按音节分割成一些较长的频谱段,减少变换后频谱的不连续现象。 4、提出了新的基于汉语音素的声音变换系统。根据观测传统分类得到的参数我们可以看到,这样训练得到的模型很难很好的表征不同音素的参数特点,因此,为了将每个音素的特征分别进行表征,我们提出为每个音素训练一个GMM的方法来实现声音变换系统。该系统的MOS分和ABX分分别比基线系统提高了47%和26%。 5、实现一个以非并行语料为训练数据的非并行声音变换系统。由于在实际应用中很多情况下通常不能满足并行数据的条件,针对这种应用需求,本文提出一种采用非并行语料进行训练的声音变换系统,与基于音素的并行系统相比,两个系统性能相当。 6、将基于音素的声音变换技术推广到英文声音变换系统中。目前声音变换技术除了在汉语普通话上进行研究以外,还有很大一部分研究工作是在英语上的。因此,为了能够和国内外的英文声音变换系统相比较,本文将前面所介绍的基于音素的声音变换技术应用到英语中,实现了一个英文的声音变换系统。 7、完成了一个实时的声音变换演示系统。本文实现了一个基于单音素[A]进行汉语语音音色变换的实时演示系统。
英文摘要Voice conversion for given speakers is to transform a given source speaker’s voice to sound as if it is uttered by another given target speaker. This dissertation aims at building a high-performance Chinese voice conversion system. Based on up-to-date techniques of voice conversion, we propose an algorithm to select acoustically stable frame of a vowel phoneme to represent the characteristics of this vowel phoneme and by this may we design a novel Chinese phoneme-based voice conversion system. Besides these, we also propose a novel voice conversion system using non-parallel training data. The phoneme-based voice conversion techniques have been introduced to English voice conversion system. Finally, an online voice conversion system has been implemented. The dissertation contains the following works and contributions: 1. A novel algorithm to select the characteristics at the steady-state as the training parameters for each phoneme has been proposed. In order to avoid using the formant transitions of each phoneme as the training parameters, we propose a procedure to select the steady-state formant features for each phoneme. 2. Accent analysis for Mandarin Chinese based on formant frequencies is presented in this paper. The results show that accent has significant influence on the second formant frequency of monophthongs [o, i, u] and has no obvious influence on the formant frequencies of monophthong [a]. 3. We propose a main vowel phoneme selection method. In order to reduce these artifacts which may occur in the phoneme pontes of the converted speech if the final part of a Chinese character or a syllable is a diphthong or a tri-phthong, we select a main vowel for each final to warp the spectrum in a longer segment: a final. 4. A novel phoneme-based voice conversion system for Chinese has been implemented. The results from traditional classification on the spectral parameters can not represent each phoneme’s characteristics quite well. In order to describe characteristics of each phoneme, spectral parameters of each vowel phoneme are grouped to train GMM of this vowel phoneme. Compared with the baseline system, the MOS score and ABX score have been increased by 47% and 26% respectively。 5. The phoneme-based voice conversion system has been applied to non-parallel voice conversion system. In the practical applications, this parallel data usually is very difficult to be collected for many cases. Voice conversion with non-parallel data should be addressed to relax this constraint. Phoneme-based voice conversion through non-parallel training for Mandarin has been presented in this paper. Compared with phoneme-based voice conversion system through parallel training, their performances are comparable. 6. The phoneme-based Chinese voice conversion system has been extended to English voice conversion system. Since many researches have been done using English corpus, a phoneme-based English voice conversion system has been brought forward in this paper to be compared. 7. An online voice conversion system has been completed. A voice conversion demo system has been designed using only phoneme [a] for training.
语种中文
公开日期2011-05-07
页码102
源URL[http://159.226.59.140/handle/311008/200]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
刘昆. 特定说话人的声音变换[D]. 声学研究所. 中国科学院声学研究所. 2007.

入库方式: OAI收割

来源:声学研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。