中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
distribution of multi-words in chinese and english documents

文献类型:期刊论文

作者Zhang Wen ; Yoshida Taketoshi ; Tang Xijin
刊名INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING
出版日期2009
卷号8期号:2页码:249-265
关键词Multi-word term distribution Poisson distribution zero-inflated distribution G-distribution
ISSN号0219-6220
学科主题Computer Science, Artificial Intelligence; Computer Science, Information Systems; Computer Science, Interdisciplinary Applications; Operations Research & Management Science
收录类别SCI
语种英语
WOS记录号WOS:000267703000004
公开日期2011-03-18
附注As a hybrid of N-gram in natural language processing and collocation in statistical linguistics, multi-word is becoming a hot topic in area of text mining and information retrieval. In this paper, a study concerning distribution of multi-words is carried out to explore a theoretical basis for probabilistic term-weighting scheme. Specifically, the Poisson distribution, zero-inflated binomial distribution, and G-distribution are comparatively studied on a task of predicting probabilities of multi-words occurrences using these distributions, for both technical multi-words and nontechnical multi-words. In addition, a rule-based multi-word extraction algorithm is proposed to extract multi-words from texts based on words occurring patterns and syntactical structures. Our experimental results demonstrate that G-distribution has the best capability to predict probabilities of frequency of multi-words occurrence and the Poisson distribution is comparable to zero-inflated binomial distribution in estimation of multi-word distribution. The outcome of this study validates that burstiness is a universal phenomenon in linguistic count data, which is applicable not only for individual content words but also for multi-words.
源URL[http://124.16.136.157/handle/311060/7742]  
专题软件研究所_软件所图书馆_2009年期刊/会议论文
推荐引用方式
GB/T 7714
Zhang Wen,Yoshida Taketoshi,Tang Xijin. distribution of multi-words in chinese and english documents[J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING,2009,8(2):249-265.
APA Zhang Wen,Yoshida Taketoshi,&Tang Xijin.(2009).distribution of multi-words in chinese and english documents.INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING,8(2),249-265.
MLA Zhang Wen,et al."distribution of multi-words in chinese and english documents".INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING 8.2(2009):249-265.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。