中国科学院机构知识库网格系统: 多语言单词字音转换的研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

多语言单词字音转换的研究

文献类型：学位论文


作者	李鹏
学位类别	工学博士
答辩日期	2008-05-20
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	徐波
关键词	字音转换决策树随机森林 grapheme-to-phoneme conversion G2P decision trees random forests AdaBoost
其他题名	Research on Multilingual Grapheme-to-Phoneme Conversion
学位专业	模式识别与智能系统
中文摘要	在语音识别和语音合成的应用系统中，经常会遇到发音词典中没有的单词，因此需要提供一个模块自动的为这类单词注音，这个任务称为单词的字音转换（grapheme-to-phoneme conversion）。在几十年的研究历史中，研究者试图从两个方向解决这个问题，即利用基于专家知识的手写规则方法和数据驱动的基于机器学习的方法。近年来的实践表明，后者在转换准确性、语言独立性等方面都超过了前者，但是对于英语这样的发音规律性很差的语言，现有的方法还不能达到满意的性能。本文针对字母语言的单词字音转换问题做了细致深入的研究，主要贡献和创新点归纳如下： 1. 改进了基于决策树的字音转换方法。在已经提出的诸多基于机器学习的方法中，基于决策树的方法获得了很好的效果，但是现有文献中缺乏对实现中一些关键因素的讨论。本文通过实验分析了这些因素对系统整体性能的影响，证明通过细致的调节，可以大幅提高字音转换的准确率。另外，还提出了两个新的方法，解决了词典的字音对齐和快速寻找最优剪枝参数的问题。 2. 提出了基于Bagging和随机森林的字音转换方法。决策树方法虽然可以很好的描述训练数据，但是泛化能力有限：泛化错误可以分解为模型的偏倚和方差，单一的决策树无法同时降低这两部分。Bagging和随机森林都属于聚合分类器，它们通过在训练过程中引入随机因素，使用相同的训练数据得到许多不同的决策树分类器，将它们的分类结果投票产生最后的输出，同时降低了偏倚和方差，因而降低了泛化错误率。实验证明，使用这两个方法可以取得明显优于决策树的字音转换准确率。 3. 提出了基于AdaBoost的字音转换方法。AdaBoost方法通过对训练样本加权，根据分类错误调节权重，迭代训练若干分类器，最后将这些分类器的结果加权投票产生最终的分类结果。AdaBoost使用自适应调节权重的方法，使分类器更侧重于对分类错误率高的训练样本进行分类，通过投票的方式可以将所谓的“弱分类器”组合为“强分类器”，得到很好的分类能力。本文提出的基于AdaBoost的字音转换方法也取得了比决策树方法更高转换准确率。 4. 将本文提出的几种方法整合为一个融合系统，在NETtalk和CMU两个英文词典测试集上得到的转换准确率高于已发表文献中的最高水平。
英文摘要	In applications of speech recognition and synthesis, OOV (out-of-vocabulary) words are often encountered, so there should be a module to perform the automatic grapheme-to-phoneme (G2P) conversion. In the past decades, there exist two categories of solutions, namely the expertise knowledge based manual-written-rules methods and the data driven machine learning methods. In recent years, it is demonstrated that the latter outperforms the former in both conversion accuracy and language independency. But for the very irregular languages such as English, the current methods still can’t achieve satisfied performances. In this thesis, we delve into this area, and obtain significant improvements. The main contributions and novelties include: 1. The improvement of a decision trees based G2P conversion system. Decision trees based G2P conversion systems have achieved best performances among all machine learning based systems, but there are some key issues lacks of discussions in the literature. We analyzed these issues by experiments, and concluded that by carefully adjusting the settings, the G2P conversion accuracy can be improved a lot. We also proposed two new methods for lexicon alignment and for fast tree pruning in this work. 2. Two new G2P conversion methods based on bagging (bootstrap aggregating) and random forests are proposed. Although decision trees can model the training data well, their capability of generalization is limited: the generalization error can be decomposed to bias and variance, and the decision trees cannot decrease both of them at one time. Bagging and random forests are ensemble classifiers which create different decision trees using the same training data by introducing randomness in the training procedure. The classification result of the ensemble classifier is obtained by voting the results of all the decision trees, thus the bias and variance are reduced simultaneously and hence the generalization error is reduced. Experiments proved that the new methods outperform the decision tree based method significantly. 3. A new G2P conversion method based on AdaBoost is proposed. AdaBoost is another ensemble classifier which adaptively adjusts the weight of each training sample, and makes the new classifier concentrates on the samples that are hard to be correctly classified. The adjustments of sample weights are directed by the misclassification of the last classifier, and the new classifier is trained iteratively. By weighted voting of all the classifiers, AdaBoost can turn the so called “weak classifiers” to “strong classifiers”, and are successfully used in face detection systems. The AdaBoost based G2P conversion system presented in this thesis also obtained better results than the decision trees based method.
语种	中文
其他标识符	200418014628081
源URL	[http://ir.ia.ac.cn/handle/173211/6058]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李鹏. 多语言单词字音转换的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2008.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。