tibetanword segmentation as syllable tagging using conditional random field
文献类型:期刊论文
作者 | Liu Huidan ; Nuo Minghua ; Ma Longlong ; Wu Jian ; He Yeping |
刊名 | PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation |
出版日期 | 2011 |
页码 | 168-177 |
关键词 | Computational linguistics Random processes |
中文摘要 | In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He. |
英文摘要 | In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He. |
收录类别 | EI |
语种 | 英语 |
公开日期 | 2013-10-08 |
源URL | [http://ir.iscas.ac.cn/handle/311060/16170] |
专题 | 软件研究所_软件所图书馆_期刊论文 |
推荐引用方式 GB/T 7714 | Liu Huidan,Nuo Minghua,Ma Longlong,et al. tibetanword segmentation as syllable tagging using conditional random field[J]. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,2011:168-177. |
APA | Liu Huidan,Nuo Minghua,Ma Longlong,Wu Jian,&He Yeping.(2011).tibetanword segmentation as syllable tagging using conditional random field.PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,168-177. |
MLA | Liu Huidan,et al."tibetanword segmentation as syllable tagging using conditional random field".PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (2011):168-177. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。