中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Co-occurrence degree based word alignment: A case study on Uyghur-Chinese

文献类型:期刊论文

作者Mi, Chenggang1; Yang, Yating1; Zhou, Xi1; Li, Xiao1; Osman, Turghun1
刊名Lecture Notes in Computer Science
出版日期2014
卷号8801期号:1页码:259-268
关键词Uyghur - Chinese Word Alignment Co - Occurrence Degree Co - Occurrence Count Agglutinative Language Da Ta Sparseness
英文摘要Most widely used word alignment models are based on word co-occurrence counts in parallel corpus. However, the data sparseness during training of the word alignment model makes word co-occurrence counts of Uyghur-Chinese parallel corpus cannot indicate associations between source and target words effectively. In this paper, we propose a Uyghur-Chinese word alignment method based on word co-occurrence degree to alleviate the data sparseness problem. Our approach combine the co-occurrence counts and the fuzzy co-occurrence weights as word co-occurrence degree, fuzzy co-occurrence weights can be obtained by searching for fuzzy co-occurrence word pairs and computing differences of length between current Uyghur word and other Uyghur words in fuzzy co-occurrence word pairs. Experiment shows that with the co-occurrence degree based word alignment model, the performance of Uyghur-Chinese word alignment result is outperform the baseline word alignment model, the quality of Uyghur-Chinese machine translation also improved.
源URL[http://ir.xjipc.cas.cn/handle/365002/4915]  
专题新疆理化技术研究所_多语种信息技术研究室
作者单位1.Xinjiang Technical Institute of Physics&Chemistry of Chinese Academy of Sciences Urumqi, Xinjiang, China
2.University of Chinese Academy of Sciences, Beijing, China
推荐引用方式
GB/T 7714
Mi, Chenggang,Yang, Yating,Zhou, Xi,et al. Co-occurrence degree based word alignment: A case study on Uyghur-Chinese[J]. Lecture Notes in Computer Science,2014,8801(1):259-268.
APA Mi, Chenggang,Yang, Yating,Zhou, Xi,Li, Xiao,&Osman, Turghun.(2014).Co-occurrence degree based word alignment: A case study on Uyghur-Chinese.Lecture Notes in Computer Science,8801(1),259-268.
MLA Mi, Chenggang,et al."Co-occurrence degree based word alignment: A case study on Uyghur-Chinese".Lecture Notes in Computer Science 8801.1(2014):259-268.

入库方式: OAI收割

来源:新疆理化技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。