中国科学院机构知识库网格系统: Configuring In-memory Cluster Computing Using Random Forest

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Configuring In-memory Cluster Computing Using Random Forest

文献类型：期刊论文


作者	Zhendong Bei; Zhibin Yu; Ni Luo; Chuntao Jiang; Chengzhong Xu; Shengzhong Feng
刊名	FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
出版日期	2018
文献子类	期刊论文
英文摘要	Recently, in-memory cluster computing (IMC) gains momentum because it accelerates traditional on-disk cluster computing (ODC) up to several tens of times for iterative and interaction applications. The most popular IMC framework is Spark and it has more than 100 configuration parameters. However, it is unclear how significantly these parameters affect the system performance because IMC is a quite new computing paradigm. Consequently, there is yet no study addressing how to optimally configure IMC frameworks. In this paper, we first investigate how significantly the configuration parameters affect the performance of Spark workloads. We find that the configuration caused performance variation can be as large as 20.7, indicating configuring Spark workloads is extremely important to their performance. However, manually configuring Spark workloads is notoriously difficult because there are so many configuration parameters which might interfere with each other in a complex way. To address this issue, we propose an approach to Automatically Configure Spark workloads, named ACS. It firstly constructs performance models as functions of Spark configuration parameters by using random forest which is an ensemble learning algorithm. Subsequently, ACS leverages genetic algorithm to search the optimum configuration by taking configurations and the corresponding performance predicted by the performance models as inputs. We employ six Spark programs, each with five input data sets to evaluate the performance improvements. The results show that ACS speeds up the 30 program-input pairs by a factor of 2.2× on average and up to 8.2×. In addition, the performance improvements obtained by ACS increase along with the increments of the input data set sizes of Spark workloads, which is a nice property for big data analytics.
URL标识	查看原文
语种	英语
WOS记录号	WOS:000418968400001
源URL	[http://ir.siat.ac.cn:8080/handle/172644/12536]
专题	深圳先进技术研究院_数字所
作者单位	FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
推荐引用方式 GB/T 7714	Zhendong Bei,Zhibin Yu,Ni Luo,et al. Configuring In-memory Cluster Computing Using Random Forest[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,2018.
APA	Zhendong Bei,Zhibin Yu,Ni Luo,Chuntao Jiang,Chengzhong Xu,&Shengzhong Feng.(2018).Configuring In-memory Cluster Computing Using Random Forest.FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE.
MLA	Zhendong Bei,et al."Configuring In-memory Cluster Computing Using Random Forest".FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2018).

入库方式： OAI收割

来源：深圳先进技术研究院

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。