中国科学院机构知识库网格系统: 基于统计学习的汉语韵律建模及其语音识别方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于统计学习的汉语韵律建模及其语音识别方法研究

文献类型：学位论文


作者	倪崇嘉
学位类别	工学博士
答辩日期	2011-06-01
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	刘文举
关键词	韵律边界重音互补模型韵律相关的语音识别系统声调模型 prosody break stress complementary model prosody dependent speech recognition system tone model
其他题名	Research on Mandarin Prosody Model Based on Statistical Machine Learning and Its Application in Speech Recognition
学位专业	模式识别与智能系统
中文摘要	韵律模型对于提高语音合成系统的自然度、可懂度和语音识别系统的正确率以及语音理解等方面都有十分重要的作用。因此，基于具有韵律标注的语料库，利用统计机器学习的方法对韵律建模的研究越来越受到人们的重视，现已成为研究的热点。然而，现有的基于统计机器学习的韵律建模的方法还存在一些问题。如在对韵律事件建模中的声学特征和词典、语法特征的独立性假设；单一信息源可能会导致召回率很低；在韵律相关的语音识别系统中，由于韵律因素的引入而造成模型“爆炸”等。本文针对当前基于统计机器学习韵律建模的方法中存在的这些问题，给出解决这些问题有效的思路和方法。实现了韵律自动标注系统以及韵律相关的语音识别系统，一定程度上解决了上述的这些问题。主要工作和创新点如下：（1）模型互补的韵律事件（韵律间断、重音）建模方法。针对当前韵律事件建模中存在的声学特征和词典、语法特征之间的独立性假设以及单一信息源建模可能导致较低召回率的问题，提出了韵律事件建模的模型互补方法。该方法摒弃了特征之间的独立性不合理假设，而采用不同的方法对声学、词典及语法特征统一建模，然后加权融合不同方法建模结果，实现了不同建模方法的互补。实验证明，该建模方法不仅在汉语的测试集上表现了很好的分类效果，而且在英语的测试集中同样也有很好的实验结果。同时，我们还用该模型实现了对大规模连续语音库的韵律事件检测和标注，与少量人工标注的结果对比，基于互补模型的韵律事件自动标注的结果与人工标注的结果有很高的一致性。另外，我们还对汉语韵律事件检测和英语韵律事件检测的异同进行了比较，并得到一些有意义的结论。（2）汉语知识引导的韵律间断分类。结合韵律间断的层级特点，并引入汉语知识，针对不同的情况建立不同的韵律间断模型。按照韵律间断的层级，逐层细分，由粗到细，逐步实现汉语知识引导的不同韵律间断的分类。在具有韵律标注的语料库ASCCD的韵律间断分类实验表明，该方法具有很好的分类效果。（3）结合韵律特征的混合声学建模方法。韵律标注以后的大规模汉语连续语料库中音子（声韵母）的数目较传统的大规模汉语连续语料库中音子的数目增加了数倍，如果仍然按照传统的“三音子”建模韵律相关的声学模型，模型数量会激增，出现模型的“爆炸”。针对这个问题，本文提出利用韵律相关的“二音子”合成韵律相关的带调音节作为建模单元的建模方法以及将韵律无关的带调音节模型和韵律相关的带调音节模型的“混合声学建模方法”。语音识别的实验验证了该方法具有很好的识别效果。
英文摘要	Prosody model is important to improve TTS's naturalness, understandability and intelligibility, ASR's correct rate and speech understanding. But it is very expensive and time consuming to annotate prosody manually. Therefore, an automatic prosodic annotation algorithm will be very useful for building spoken language understanding systems, and the researches about automatic prosodic annotation algorithm which has been a hot topic, have attracted lots of people. But there are some deficiencies in the current prosodic modeling methods based on statistical machine learning, such as the independent hypothesis between the acoustic related features and the lexical and syntactic related features in the prosodic events modeling, the low recall rate in prosodic events modeling based on single information source, very large number of models in the prosody dependent speech recognition system. In this paper, some ideas and solutions are given to solve the above questions. We have implemented prosody automatic annotation system and prosody dependent speech recognition system, and solved the above questions. The main contributions and novelties include: (1) The prosodic events modeling method of model complementary. For the deficiencies of traditional prosodic events modeling, such as the independent hypothesis between the acoustic features and the lexical and syntactic features, the modeling based on the single information source maybe lead to the low recall rate, we propose the complementary model method to model prosodic events. The complementary model method quits the hypothesis that the acoustic features and the lexical and syntactic features are independent, utilize the different methods to model the acoustic, lexical and syntactic features unitarily, fuse the different modeling results, and realize the complementary of different methods. The experimental results denote that this modeling method achieve good performance not only on Mandarin prosodic annotation corpus but also on English prosodic annotation corpus. At the same time, we also utilize the complementary model to annotate the prosodic events in the continuous speech. Through the comparison between the automatic annotation data and a small number of the manual annotation data, we find that there is high concordance rate between them. In addition, we also make comparison between Mandarin prosodic events detection and English prosodic events detection based on the prosodic annotation corpora, and get so...
语种	中文
其他标识符	200718014628058
源URL	[http://ir.ia.ac.cn/handle/173211/6384]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	倪崇嘉. 基于统计学习的汉语韵律建模及其语音识别方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2011.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。