中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A three-phase approach to document clustering based on topic significance degree

文献类型:期刊论文

作者Ma, Yinglong (1) ; Wang, Yao (1) ; Jin, Beihong (2)
刊名Expert Systems with Applications
出版日期2014
卷号41期号:18页码:8203-8210
关键词Document clustering Topic model K-means K-means plus
ISSN号9574174
通讯作者Ma, Y.(yinglongma@gmail.com)
中文摘要Topic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach. © 2014 Elsevier Ltd. All rights reserved.
英文摘要Topic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach. © 2014 Elsevier Ltd. All rights reserved.
收录类别SCI ; EI
语种英语
WOS记录号WOS:000342250300015
公开日期2014-12-16
源URL[http://ir.iscas.ac.cn/handle/311060/16790]  
专题软件研究所_软件所图书馆_期刊论文
推荐引用方式
GB/T 7714
Ma, Yinglong ,Wang, Yao ,Jin, Beihong . A three-phase approach to document clustering based on topic significance degree[J]. Expert Systems with Applications,2014,41(18):8203-8210.
APA Ma, Yinglong ,Wang, Yao ,&Jin, Beihong .(2014).A three-phase approach to document clustering based on topic significance degree.Expert Systems with Applications,41(18),8203-8210.
MLA Ma, Yinglong ,et al."A three-phase approach to document clustering based on topic significance degree".Expert Systems with Applications 41.18(2014):8203-8210.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。