中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
针对多聚类中心大数据集的加速K-means聚类算法

文献类型:期刊论文

作者张顺龙; 库涛; 周浩
刊名计算机应用研究
出版日期2016
卷号33期号:2页码:413-416
关键词DIACK 加速K-means 聚类 三角定理
ISSN号1001-3695
其他题名Accelerate K-means for multi-center clustering of big datasets
产权排序1
通讯作者张顺龙
中文摘要随着数据量、数据维度成指数发展以及实际应用中聚类中心个数的增多,传统的K-means聚类算法已经不能满足实际应用中的时间和内存要求。针对该问题提出了一种基于动态类中心调整和Elkan三角判定思想的加速K-means聚类算法。试验结果证明,当数据规模达到10万条,聚类个数达到20个以上时,本算法相比Elkan算法具有更快的收敛速度和更低的内存开销。
英文摘要The k-means algorithm is the most popular cluster algorithm. but for big dataset clustering with many clusters. it will take a lot of time to find all the clusters. This paper proposed a new acceleration method based on the thought of dynamical and immediate adjustment of the center K-means with triangle inequality. The triangle inequality is used to avoid redundant distance computations; But unlike Elkan’s algorithm. the centers are divided into outer-centers and inner-centers for each data point in the first place. and only the tracks of the lower bounds to inner-centers are kept; On the other hand. by adjusting the data points cluster by cluster and updating the cluster center immediately right after finishing each cluster’s adjustment. the number of iteration is effectively reduced. The experiment results show that our algorithm runs much faster than Elkan’s algorithm with much less memory consumption when the cluster center number is larger than 20 and the dataset records number is greater than 10 million. and the speedup becomes better when the k increases.
收录类别CSCD
语种中文
CSCD记录号CSCD:5629682
源URL[http://ir.sia.cn/handle/173321/17319]  
专题沈阳自动化研究所_信息服务与智能控制技术研究室
推荐引用方式
GB/T 7714
张顺龙,库涛,周浩. 针对多聚类中心大数据集的加速K-means聚类算法[J]. 计算机应用研究,2016,33(2):413-416.
APA 张顺龙,库涛,&周浩.(2016).针对多聚类中心大数据集的加速K-means聚类算法.计算机应用研究,33(2),413-416.
MLA 张顺龙,et al."针对多聚类中心大数据集的加速K-means聚类算法".计算机应用研究 33.2(2016):413-416.

入库方式: OAI收割

来源:沈阳自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。