A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis
文献类型:期刊论文
作者 | Tursun, E (Tursun, Eziz); Ganguly, D (Ganguly, Debasis); Osman, T (Osman, Turghun); Yang, YT (Yang, Ya-Ting); Abdukerim, G (Abdukerim, Ghalip); Zhou, JL (Zhou, Jun-Lin); Liu, Q (Liu, Qun) |
刊名 | ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
![]() |
出版日期 | 2016 |
卷号 | 16期号:2 |
关键词 | Uyghur morphological analysis Markov model |
英文摘要 | Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we investigate the morphological analysis of the Uyghur language, which is the native language of the people in the Xinjiang Uyghur autonomous region of western China. Morphological analysis of Uyghur is challenging primarily because of factors such as (1) ambiguities arising due to the likelihood of association of a multiple number of POS tags with a word stem or a multiple number of functional tags with a word suffix, (2) ambiguous morpheme boundaries, and (3) complex morphopholonogy of the language. Further, the unavailability of a manually annotated training set in the Uyghur language for the purpose of word segmentation makes Uyghur morphological analysis more difficult. In our proposed work, we address these challenges by undertaking a semisupervised approach of learning a Markov model with the help of a manually constructed dictionary of "suffix to tag" mappings in order to predict the most likely tag transitions in the Uyghur morpheme sequence. Due to the linguistic characteristics of Uyghur, we incorporate a prior belief in our model for favoring word segmentations with a lower number of morpheme units. Empirical evaluation of our proposed model shows an accuracy of about 82%. We further improve the effectiveness of the tag transition model with an active learning paradigm. In particular, we manually investigated a subset of words for which the model prediction ambiguity was within the top 20%. Manually incorporating rules to handle these erroneous cases resulted in an overall accuracy of 93.81%. |
收录类别 | EI |
源URL | [http://ir.xjipc.cas.cn/handle/365002/4716] ![]() |
专题 | 新疆理化技术研究所_多语种信息技术研究室 |
作者单位 | 1.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Inst Math & Informat,Hotan Teachers Coll, Beijing, Peoples R China 2.Dublin City Univ, ADAPT Ctr, Sch Comp, Dublin 9, Ireland 3.Univ Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Beijing, Peoples R China 4.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Beijing 100864, Peoples R China 5.Chinese Acad Sci, Xinjiang Branch, Beijing 100864, Peoples R China |
推荐引用方式 GB/T 7714 | Tursun, E ,Ganguly, D ,Osman, T ,et al. A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2016,16(2). |
APA | Tursun, E .,Ganguly, D .,Osman, T .,Yang, YT .,Abdukerim, G .,...&Liu, Q .(2016).A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,16(2). |
MLA | Tursun, E ,et al."A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 16.2(2016). |
入库方式: OAI收割
来源:新疆理化技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。