Configuring In-memory Cluster Computing Using Random Forest
文献类型:期刊论文
作者 | Zhendong Bei; Zhibin Yu; Ni Luo; Chuntao Jiang; Chengzhong Xu; Shengzhong Feng |
刊名 | FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
![]() |
出版日期 | 2018 |
文献子类 | 期刊论文 |
英文摘要 | Recently, in-memory cluster computing (IMC) gains momentum because it accelerates traditional on-disk cluster computing (ODC) up to several tens of times for iterative and interaction applications. The most popular IMC framework is Spark and it has more than 100 configuration parameters. However, it is unclear how significantly these parameters affect the system performance because IMC is a quite new computing paradigm. Consequently, there is yet no study addressing how to optimally configure IMC frameworks. In this paper, we first investigate how significantly the configuration parameters affect the performance of Spark workloads. We find that the configuration caused performance variation can be as large as 20.7, indicating configuring Spark workloads is extremely important to their performance. However, manually configuring Spark workloads is notoriously difficult because there are so many configuration parameters which might interfere with each other in a complex way. To address this issue, we propose an approach to Automatically Configure Spark workloads, named ACS. It firstly constructs performance models as functions of Spark configuration parameters by using random forest which is an ensemble learning algorithm. Subsequently, ACS leverages genetic algorithm to search the optimum configuration by taking configurations and the corresponding performance predicted by the performance models as inputs. We employ six Spark programs, each with five input data sets to evaluate the performance improvements. The results show that ACS speeds up the 30 program-input pairs by a factor of 2.2× on average and up to 8.2×. In addition, the performance improvements obtained by ACS increase along with the increments of the input data set sizes of Spark workloads, which is a nice property for big data analytics. |
URL标识 | 查看原文 |
语种 | 英语 |
WOS记录号 | WOS:000418968400001 |
源URL | [http://ir.siat.ac.cn:8080/handle/172644/12536] ![]() |
专题 | 深圳先进技术研究院_数字所 |
作者单位 | FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE |
推荐引用方式 GB/T 7714 | Zhendong Bei,Zhibin Yu,Ni Luo,et al. Configuring In-memory Cluster Computing Using Random Forest[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,2018. |
APA | Zhendong Bei,Zhibin Yu,Ni Luo,Chuntao Jiang,Chengzhong Xu,&Shengzhong Feng.(2018).Configuring In-memory Cluster Computing Using Random Forest.FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE. |
MLA | Zhendong Bei,et al."Configuring In-memory Cluster Computing Using Random Forest".FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2018). |
入库方式: OAI收割
来源:深圳先进技术研究院
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。