distribution of multi-words in chinese and english documents
文献类型:期刊论文
作者 | Zhang Wen ; Yoshida Taketoshi ; Tang Xijin |
刊名 | INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING
![]() |
出版日期 | 2009 |
卷号 | 8期号:2页码:249-265 |
关键词 | Multi-word term distribution Poisson distribution zero-inflated distribution G-distribution |
ISSN号 | 0219-6220 |
学科主题 | Computer Science, Artificial Intelligence; Computer Science, Information Systems; Computer Science, Interdisciplinary Applications; Operations Research & Management Science |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | WOS:000267703000004 |
公开日期 | 2011-03-18 |
附注 | As a hybrid of N-gram in natural language processing and collocation in statistical linguistics, multi-word is becoming a hot topic in area of text mining and information retrieval. In this paper, a study concerning distribution of multi-words is carried out to explore a theoretical basis for probabilistic term-weighting scheme. Specifically, the Poisson distribution, zero-inflated binomial distribution, and G-distribution are comparatively studied on a task of predicting probabilities of multi-words occurrences using these distributions, for both technical multi-words and nontechnical multi-words. In addition, a rule-based multi-word extraction algorithm is proposed to extract multi-words from texts based on words occurring patterns and syntactical structures. Our experimental results demonstrate that G-distribution has the best capability to predict probabilities of frequency of multi-words occurrence and the Poisson distribution is comparable to zero-inflated binomial distribution in estimation of multi-word distribution. The outcome of this study validates that burstiness is a universal phenomenon in linguistic count data, which is applicable not only for individual content words but also for multi-words. |
源URL | [http://124.16.136.157/handle/311060/7742] ![]() |
专题 | 软件研究所_软件所图书馆_2009年期刊/会议论文 |
推荐引用方式 GB/T 7714 | Zhang Wen,Yoshida Taketoshi,Tang Xijin. distribution of multi-words in chinese and english documents[J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING,2009,8(2):249-265. |
APA | Zhang Wen,Yoshida Taketoshi,&Tang Xijin.(2009).distribution of multi-words in chinese and english documents.INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING,8(2),249-265. |
MLA | Zhang Wen,et al."distribution of multi-words in chinese and english documents".INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING 8.2(2009):249-265. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。