中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
基于多系统融合的统计机器翻译模型及系统研究

文献类型:学位论文

作者杜金华
学位类别工学博士
答辩日期2008-05-23
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师徐波
关键词统计机器翻译 双语语料库建设 多引擎翻译平台 相对位置向量重排序模型 MBR解码 混淆网络 多系统融合框架 statistical machine translation bilingual corpus construction multi-engine translation platform relative position vector re-ordering model MBR decoding Confusion Network system combination framework
其他题名Research on Statistical Machine Translation Model and System Based on Multiple System Combination
学位专业模式识别与智能系统
中文摘要基于多翻译系统融合框架,针对汉英双语语料优化处理、多引擎平台建设以及短语模型优化等主要问题,进行深入细致的分析和研究,提出解决方案,并通过大量的实验进行对比验证。论文的主要工作归纳如下: 1. 提出面向统计机器翻译的语料库建设规范与实现流程,改进基于内容的语料优化方法。 2. 提出多引擎统计机器翻译平台建设及实现流程,并对短语翻译系统的关键模块和平台中与具体系统无关的公共模块进行多种优化处理。 我们为统计机器翻译模型和算法研究搭建了一个良好的多引擎实验平台,同时也为面向工程性开发提供了一个转换平台。在基于短语翻译系统的模块优化中,重点对短语翻译模型进行优化。 3. 提出基于位置向量预测的短语翻译系统调序模型。 基于短语的统计机器翻译系统的主要问题是短语重排序。本文提出基于短语相对位置和方向关系的位置向量预测模型。 4. 提出基于混淆网络解码的多特征系统融合框架。 该框架是基于词级进行系统融合的一种方法,是基于MBR解码和混淆网络解码的多特征融合框架。解码模型采用对数线性模型,以词的后验概率、语言模型、词性语言模型和句子长度惩罚作为特征,使用柱搜索技术对混淆网络进行最优路径搜索。
英文摘要Under the framework of multiple system combination, this paper mainly analyzes and does research on some key problems such as Chinese-English bilingual corpus processing and optimization, multi-engine platform construction and phrase-based model optimization. Meanwhile, this paper also proposes many related solutions and makes plenty of experiments to verify their effectiveness. The main contributions of this paper are as follows: 1. Study on Chinese-English bilingual corpus construction and realization, and propose a content-based optimization method for bilingual corpus processing 2. Study on multi-engine SMT platform construction and realization, and propose some strategies for phrase-based model optimization and common modules optimization. We construct a multi-engineer experimental platform for research on SMT models and algorithms. Meanwhile, it also could be used as a transferring platform for application development. In the optimization of phrase-based model, we focus on phrase extraction and probability computing optimization. 3. Propose a local prediction re-ordering model based on relative position vector for phrase-based system The major problem of phrase-based SMT is phrase re-ordering. This paper proposes a prediction model based on phrase relative positions and orientations. 4. Propose the framework of multiple system combination based on Confusion Network decoding The proposed framework is based on word-level combination, and uses the Minimum Bayes Risk decoding and Confusion Network decoding techniques. We add the word posterior, language model, POS language model and word penalty as the features into a log-linear model, and then search a best path to output by beam search technique.
语种中文
其他标识符200418014628080
源URL[http://ir.ia.ac.cn/handle/173211/6063]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
杜金华. 基于多系统融合的统计机器翻译模型及系统研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2008.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。