中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data

文献类型:期刊论文

作者He, Yaobin; Tan, Haoyu; Luo, Wuman; Feng, Shengzhong; Fan, Jianping
刊名FRONTIERS OF COMPUTER SCIENCE
出版日期2014
英文摘要DBSCAN (density-based spatial clustering of applications with noise) is an important spatial clustering technique that is widely adopted in numerous applications. As the size of datasets is extremely large nowadays, parallel processing of complex data analysis such as DBSCAN becomes indispensable. However, there are three major drawbacks in the existing parallel DBSCAN algorithms. First, they fail to properly balance the load among parallel tasks, especially when data are heavily skewed. Second, the scalability of these algorithms is limited because not all the critical sub-procedures are parallelized. Third, most of them are not primarily designed for shared-nothing environments, which makes them less portable to emerging parallel processing paradigms. In this paper, we present MR-DBSCAN, a scalable DBSCAN algorithm using MapReduce. In our algorithm, all the critical sub-procedures are fully parallelized. As such, there is no performance bottleneck caused by sequential processing. Most importantly, we propose a novel data partitioning method based on computation cost estimation. The objective is to achieve desirable load balancing even in the context of heavily skewed data. Besides, We conduct our evaluation using real large datasets with up to 1.2 billion points. The experiment results well confirm the efficiency and scalability of MR-DBSCAN.
收录类别SCI
原文出处http://link.springer.com/article/10.1007/s11704-013-3158-3#page-1
语种英语
源URL[http://ir.siat.ac.cn:8080/handle/172644/5952]  
专题深圳先进技术研究院_数字所
作者单位FRONTIERS OF COMPUTER SCIENCE
推荐引用方式
GB/T 7714
He, Yaobin,Tan, Haoyu,Luo, Wuman,et al. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data[J]. FRONTIERS OF COMPUTER SCIENCE,2014.
APA He, Yaobin,Tan, Haoyu,Luo, Wuman,Feng, Shengzhong,&Fan, Jianping.(2014).MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data.FRONTIERS OF COMPUTER SCIENCE.
MLA He, Yaobin,et al."MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data".FRONTIERS OF COMPUTER SCIENCE (2014).

入库方式: OAI收割

来源:深圳先进技术研究院

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。