中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce

文献类型:会议论文

作者Yaobin He; Haoyu Tan; Wuman Luo; Huajian Mao; Di Ma; Shengzhong Feng; Jianping Fan
出版日期2011
会议名称2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
会议地点Tainan, Taiwan
英文摘要Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scale up of our work are very efficient.
收录类别EI
语种英语
源URL[http://ir.siat.ac.cn:8080/handle/172644/3588]  
专题深圳先进技术研究院_数字所
作者单位2011
推荐引用方式
GB/T 7714
Yaobin He,Haoyu Tan,Wuman Luo,et al. MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce[C]. 见:2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011. Tainan, Taiwan.

入库方式: OAI收割

来源:深圳先进技术研究院

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。