中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
对话行为理解与口语翻译方法研究

文献类型:学位论文

作者周可艳
学位类别工学博士
答辩日期2010-05-27
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师宗成庆
关键词对话行为 口语翻译 口语语料库 口语现象 Dialog-Act Spoken Language Translation Dialog Corpus Ill-formedness
其他题名Research on Dialog-Act Understanding and Spoken Language Translation
学位专业模式识别与智能系统
中文摘要口语对话理解是指用计算机实现对口语对话的解析,是完善人机对话系统、提高口语机器翻译水平等口语处理系统的关键性问题。对话行为作为描述口语对话的语用特征,结合了交际意图与语义信息,属于浅层篇章结构的范畴,目前已被应用于语音识别系统、人机对话系统、自动文摘系统及口语翻译系统。 本文的研究内容包括大规模真实口语对话语料库建设及标注方法研究,对话行为建模和自动识别方法的研究与实现,融合对话行为理解的口语翻译方法研究。 在口语对话语料库的建设方面,本文根据中文口语对话的特点,侧重于口语现象的描述,提出了一套改进的对话行为标注规范,建立了专门的口语现象描述方法。在此基础上,建立了基于真实电话录音的汉语口语对话标注语料库。该语料库除了含有丰富的语音、语义、语用等多层标注信息,还描述了插入、重复、次序颠倒等多种口语现象。不仅可以使研究人员对口语对话理解进行研究,而且还可以针对口语现象进行分析处理。该语料库的研究,对于促进口语系统走向应用有重要的意义。 在对话行为的建模和自动识别方法研究方面,本文提出了基于马尔柯夫决策过程(Markov Decision Process, MDP)的对话行为预测模型。基于该模型的对话行为预测结果融入到基于语句的对话行为识别中,取得了较好的识别效果。在该问题研究中,作者不仅改进了识别模型,而且从特征选取角度出发,提出了基本名词短语、邻接对等有效的新特征和多种特征组合方法,使对话行为自动识别的正确率有了进一步的显著提高。 在改进口语翻译系统性能方面,本文提出了融合对话行为这一语用信息的口语翻译方法。该方法以基于短语的统计机器翻译系统为应用对象,利用对话行为的自动分类,使训练语料-测试语料、开发集-测试集、源语言-目标语言的一致性得到提高,提高了翻译系统的性能,使最终的翻译结果可以更准确地反映源语言所要表达的对话意图。 另外,本文还提出了一种基于语义词典的未登录词处理方法,该方法利用汉语同义词知识对源语言未登录词的语义进行解释,在一定程度上解决了口语翻译中未登录词的翻译问题。
英文摘要Spoken Language Understanding is a technology that focuses on analysing dialogs automatically, which is key technology for improving the performance of spoken language processing system, such as dialog system, spoken language translation system, etc. Dialog-Act, which is a combination of a communicative function and a semantic content, belongs to shallow discourse structure. Dialog-Act has been applied in several kinds of systems, such as speech recognition, spoken dialog system, summarization, and spoken language translation. Our work includes implementation of building a large-scale annotated corpus base of Chinese human-human naturally-occurring corpus, dialog-act modeling and automatic recognization, spoken language translation based on dialog-act understanding. In building and annotating corpus, we improve the dialog-act annotation guidelines and give an ill-formedness description based on dialog analysis. We build an annotated Chinese dialog corpus based on telephone recordings. The corpus not only includes labels of phonetic, linguistic and paralinguistic annotation, but also describes sereval ill-formedness phenomenas. The corpus is being extended to a large corpus base of annotated Chinese dialogs for spoken Chinese study. In dialog-act modeling and recognition, we introduce a novel model to predict and tag the dialog act, in which Markov Decision Process (MDP) is utilized to predict the dialog act sequence instead of using traditional dialog act based n-gram, and Support Vector Machine (SVM) is employed to classify the dialog act for each utterance. Moreover, we investigate feature selection and combination for dialog act recognition, which improves accuracy of dialog act recognition significantly. Especially, we do experiment on several novel features and feature combination strategy. Based on annotated corpus and dialog act automatic recognition technology, we propose three kinds of applications of dialog act in phrase-based translation. Spoken translation system is benefited from the pragmatics information provided by dialog act. The consistencies of training data and test data, develop set and test set, source language and target language are improved through dialog act classification, so that translation process is more effective and translation result is more accurate in reflecting the intention of source language. In phrased-based translation system, we also propose an approach of applying semantics knowledge into phra...
语种中文
其他标识符200718014628084
源URL[http://ir.ia.ac.cn/handle/173211/6246]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
周可艳. 对话行为理解与口语翻译方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2010.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。