中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
tibetanword segmentation as syllable tagging using conditional random field

文献类型:期刊论文

作者Liu Huidan ; Nuo Minghua ; Ma Longlong ; Wu Jian ; He Yeping
刊名PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
出版日期2011
页码168-177
关键词Computational linguistics Random processes
中文摘要In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.
英文摘要In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.
收录类别EI
语种英语
公开日期2013-10-08
源URL[http://ir.iscas.ac.cn/handle/311060/16170]  
专题软件研究所_软件所图书馆_期刊论文
推荐引用方式
GB/T 7714
Liu Huidan,Nuo Minghua,Ma Longlong,et al. tibetanword segmentation as syllable tagging using conditional random field[J]. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,2011:168-177.
APA Liu Huidan,Nuo Minghua,Ma Longlong,Wu Jian,&He Yeping.(2011).tibetanword segmentation as syllable tagging using conditional random field.PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,168-177.
MLA Liu Huidan,et al."tibetanword segmentation as syllable tagging using conditional random field".PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (2011):168-177.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。