Uyghur word segmentation using a combination of rules and statistics
文献类型:期刊论文
作者 | Xue, Huajian; Yang, Yong; Turghun, Osman; Li, Xiao; Zhang, Ronghui |
刊名 | Advances in Information Sciences and Service Sciences
![]() |
出版日期 | 2011 |
卷号 | 3期号:11 |
ISSN号 | 19763700 |
英文摘要 | Rich morphology of Uyghur produces a large number of words and leads to high out of vocabulary (OOV) rates that can cause many errors in Uyghur natural language processing (NLP). Morphological word segmentation is the very important component to overcome this problem caused by Uyghur morphology. This paper depicts some morphological rules by analyzing the universal structure of Uyghur words and presents a partly supervised word segmentation method. In this method, the suffix corpus was utilized to give all the possible morphological word segmentations, from which the optimal word segmentation is selected by the MAP-based model. In addition, cascaded language model was used to improve the accuracy of word segmentation. The test set composed of 5000 words was collected and segmented by hand. The experiment on this test set was given and experimental results show that the proposed method was more effective. |
收录类别 | EI |
公开日期 | 2014-11-11 |
源URL | [http://ir.xjipc.cas.cn/handle/365002/4160] ![]() |
专题 | 新疆理化技术研究所_多语种信息技术研究室 |
作者单位 | Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, China |
推荐引用方式 GB/T 7714 | Xue, Huajian,Yang, Yong,Turghun, Osman,et al. Uyghur word segmentation using a combination of rules and statistics[J]. Advances in Information Sciences and Service Sciences,2011,3(11). |
APA | Xue, Huajian,Yang, Yong,Turghun, Osman,Li, Xiao,&Zhang, Ronghui.(2011).Uyghur word segmentation using a combination of rules and statistics.Advances in Information Sciences and Service Sciences,3(11). |
MLA | Xue, Huajian,et al."Uyghur word segmentation using a combination of rules and statistics".Advances in Information Sciences and Service Sciences 3.11(2011). |
入库方式: OAI收割
来源:新疆理化技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。