中国科学院机构知识库网格系统: dacoop: accelerating data-iterative applications on map/reduce cluster

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

dacoop: accelerating data-iterative applications on map/reduce cluster

文献类型：会议论文


作者	Liang Yi ; Li Guangrui ; Wang Lei ; Hu Yanpeng
出版日期	2011
会议名称	2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011
会议日期	October 20, 2011 - October 22, 2011
会议地点	Gwangju, Korea, Republic of
关键词	Cache memory Cluster computing Multitasking Scheduling algorithms Turnaround time
页码	207-214
中文摘要	Map/reduce is a popular parallel processing framework for massive-scale data-intensive computing. The data-iterative application is composed of a serials of map/reduce jobs and need to repeatedly process some data files among these jobs. The existing implementation of map/reduce framework focus on perform data processing in a single pass with one map/reduce job and do not directly support the data-iterative applications, particularly in term of the explicit specification of the repeatedly processed data among jobs. In this paper, we propose an extended version of Hadoop map/reduce framework called Dacoop. Dacoop extends Map/Reduce programming interface to specify the repeatedly processed data, introduces the shared memorybased data cache mechanism to cache the data since its first access, and adopts the caching-aware task scheduling so that the cached data can be shared among the map/reduce jobs of data-iterative applications. We evaluate Dacoop on two typical data-iterative applications: k-means clustering and the domain rule reasoning in sementic web, with real and synthetic datasets. Experimental results show that the data-iterative applications can gain better performance on Dacoop than that on Hadoop. The turnaround time of a data-iterative application can be reduced by the maximum of 15.1%. © 2011 IEEE.
英文摘要	Map/reduce is a popular parallel processing framework for massive-scale data-intensive computing. The data-iterative application is composed of a serials of map/reduce jobs and need to repeatedly process some data files among these jobs. The existing implementation of map/reduce framework focus on perform data processing in a single pass with one map/reduce job and do not directly support the data-iterative applications, particularly in term of the explicit specification of the repeatedly processed data among jobs. In this paper, we propose an extended version of Hadoop map/reduce framework called Dacoop. Dacoop extends Map/Reduce programming interface to specify the repeatedly processed data, introduces the shared memorybased data cache mechanism to cache the data since its first access, and adopts the caching-aware task scheduling so that the cached data can be shared among the map/reduce jobs of data-iterative applications. We evaluate Dacoop on two typical data-iterative applications: k-means clustering and the domain rule reasoning in sementic web, with real and synthetic datasets. Experimental results show that the data-iterative applications can gain better performance on Dacoop than that on Hadoop. The turnaround time of a data-iterative application can be reduced by the maximum of 15.1%. © 2011 IEEE.
收录类别	EI
会议录	Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
语种	英语
ISBN号	9780769545646
源URL	[http://ir.iscas.ac.cn/handle/311060/16322]
专题	软件研究所_软件所图书馆_会议论文
推荐引用方式 GB/T 7714	Liang Yi,Li Guangrui,Wang Lei,et al. dacoop: accelerating data-iterative applications on map/reduce cluster[C]. 见:2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011. Gwangju, Korea, Republic of. October 20, 2011 - October 22, 2011.

入库方式： OAI收割

来源：软件研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。