中国科学院机构知识库网格系统: Text clustering using frequent itemsets

Text clustering using frequent itemsets

文献类型：期刊论文


作者	Zhang, Wen 1; Yoshida, Taketoshi 3; Tang, Xijin2 ; Wang, Qing 1
刊名	KNOWLEDGE-BASED SYSTEMS
出版日期	2010-07-01
卷号	23 期号:5 页码:379-388
关键词	Document clustering Frequent itemsets Maximum capturing Similarity measure Competitive learning
ISSN号	0950-7051
DOI	10.1016/j.knosys.2010.01.011
英文摘要	Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we present a review on existing methods of document clustering using frequent patterns. Second, a new method called Maximum Capturing is proposed for document clustering. Maximum Capturing includes two procedures: constructing document clusters and assigning cluster topics. We develop three versions of Maximum Capturing based on three similarity measures. We propose a normalization process based on frequency sensitive competitive learning for Maximum Capturing to merge cluster candidates into predefined number of clusters. Third, experiments are carried out to evaluate the proposed method in comparison with CFWS, CMS, FTC and FIHC methods. Experiment results show that in clustering, Maximum Capturing has better performances than other methods mentioned above. Particularly, Maximum Capturing with representation using individual words and similarity measure using asymmetrical binary similarity achieves the best performance. Moreover, topics produced by Maximum Capturing distinguished clusters from each other and can be used as labels of document clusters. (C) 2010 Elsevier B.V. All rights reserved.
资助项目	National Natural Science Foundation of China[90718042] ; National Natural Science Foundation of China[60873072] ; National Natural Science Foundation of China[60903050] ; National Hi-Tech RD Plan of China[2007AA010303] ; National Hi-Tech RD Plan of China[2007AA01Z186] ; National Hi-Tech RD Plan of China[2007AA01Z179] ; National Basic Research Program[2007CB310802] ; Foundation of Young Doctors of Institute of Software, Chinese Academy of Sciences[ISCAS2009-DR03]
WOS研究方向	Computer Science
语种	英语
WOS记录号	WOS:000278881300002
出版者	ELSEVIER SCIENCE BV
源URL	[http://ir.amss.ac.cn/handle/2S8OKBNM/10219]
专题	系统科学研究所
通讯作者	Zhang, Wen
作者单位	1.Chinese Acad Sci, Inst Software, Lab Internet Software Technol, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Inst Syst Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China 3.Japan Adv Inst Sci & Technol, Sch Knowledge Sci, Tatsunokuchi, Ishikawa 9231292, Japan
推荐引用方式 GB/T 7714	Zhang, Wen,Yoshida, Taketoshi,Tang, Xijin,et al. Text clustering using frequent itemsets[J]. KNOWLEDGE-BASED SYSTEMS,2010,23(5):379-388.
APA	Zhang, Wen,Yoshida, Taketoshi,Tang, Xijin,&Wang, Qing.(2010).Text clustering using frequent itemsets.KNOWLEDGE-BASED SYSTEMS,23(5),379-388.
MLA	Zhang, Wen,et al."Text clustering using frequent itemsets".KNOWLEDGE-BASED SYSTEMS 23.5(2010):379-388.

入库方式： OAI收割

来源：数学与系统科学研究院

下载0

Text clustering using frequent itemsets

其他版本