中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis

文献类型:期刊论文

作者Tursun, E (Tursun, Eziz); Ganguly, D (Ganguly, Debasis); Osman, T (Osman, Turghun); Yang, YT (Yang, Ya-Ting); Abdukerim, G (Abdukerim, Ghalip); Zhou, JL (Zhou, Jun-Lin); Liu, Q (Liu, Qun)
刊名ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
出版日期2016
卷号16期号:2
关键词Uyghur morphological analysis Markov model
英文摘要Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we investigate the morphological analysis of the Uyghur language, which is the native language of the people in the Xinjiang Uyghur autonomous region of western China. Morphological analysis of Uyghur is challenging primarily because of factors such as (1) ambiguities arising due to the likelihood of association of a multiple number of POS tags with a word stem or a multiple number of functional tags with a word suffix, (2) ambiguous morpheme boundaries, and (3) complex morphopholonogy of the language. Further, the unavailability of a manually annotated training set in the Uyghur language for the purpose of word segmentation makes Uyghur morphological analysis more difficult. In our proposed work, we address these challenges by undertaking a semisupervised approach of learning a Markov model with the help of a manually constructed dictionary of "suffix to tag" mappings in order to predict the most likely tag transitions in the Uyghur morpheme sequence. Due to the linguistic characteristics of Uyghur, we incorporate a prior belief in our model for favoring word segmentations with a lower number of morpheme units. Empirical evaluation of our proposed model shows an accuracy of about 82%. We further improve the effectiveness of the tag transition model with an active learning paradigm. In particular, we manually investigated a subset of words for which the model prediction ambiguity was within the top 20%. Manually incorporating rules to handle these erroneous cases resulted in an overall accuracy of 93.81%.
收录类别EI
源URL[http://ir.xjipc.cas.cn/handle/365002/4716]  
专题新疆理化技术研究所_多语种信息技术研究室
作者单位1.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Inst Math & Informat,Hotan Teachers Coll, Beijing, Peoples R China
2.Dublin City Univ, ADAPT Ctr, Sch Comp, Dublin 9, Ireland
3.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Beijing, Peoples R China
4.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Beijing 100864, Peoples R China
5.Chinese Acad Sci, Xinjiang Branch, Beijing 100864, Peoples R China
推荐引用方式
GB/T 7714
Tursun, E ,Ganguly, D ,Osman, T ,et al. A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2016,16(2).
APA Tursun, E .,Ganguly, D .,Osman, T .,Yang, YT .,Abdukerim, G .,...&Liu, Q .(2016).A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,16(2).
MLA Tursun, E ,et al."A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 16.2(2016).

入库方式: OAI收割

来源:新疆理化技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。