Datasize-Aware High Dimensional ConfigurationsAuto-Tuning of In-Memory Cluster Computing
文献类型:会议论文
作者 | Zhibin Yu; Zhendong Bei; Xuehai Qian |
出版日期 | 2018 |
会议日期 | 2018 |
会议地点 | Williamsburg, VA, USA |
英文摘要 | In-MemoryclusterComputing(IMC)frameworks(e.g.,Spark) have become increasingly important because they typically achievemorethan10 × speedupsoverthetraditionalOn-Disk cluster Computing (ODC) frameworks for iterative and in- teractive applications. Like ODC, IMC frameworks typically run the same given programs repeatedly on a given cluster with similar input dataset size each time. It is challenging to build performance model for IMC program because: 1) the performance of IMC programs is more sensitive to the size of input dataset, which is known to be difficult to be incorpo- rated into a performance model due to its complex effects on performance; 2) the number of performance-critical configu- ration parameters in IMC is much larger than ODC (more than40vs.around10),thehighdimensionalityrequiresmore sophisticated models to achieve high accuracy. To address this challenge, we propose DAC, a datasize- aware auto-tuning approach to efficiently identify the high dimensionalconfigurationforagivenIMCprogramtoachieve optimal performance on a given cluster. DAC is a significant advance over the state-of-the-art because it can take the size of input dataset and 41 configuration parameters as the pa- rameters of the performance model for a given IMC program, —unprecedentedinpreviouswork.Itismadepossiblebytwo key techniques: 1) Hierarchical Modeling (HM), which com- bines a number of individual sub-models in a hierarchical manner;2)GeneticAlgorithm(GA)isemployedtosearchthe optimal configuration. To evaluate DAC, we use six typical Spark programs, each with five different input dataset sizes. The evaluation results show that DAC improves the perfor- mance of six typical Spark programs, each with five different input dataset sizes compared to default configurations by a factor of 30.4× on average and up to 89× . We also report that the geometric mean speedups of DAC over configurations by default, expert, and RFHOC are 15.4× , 2.3× , and 1.5× , respectively. |
语种 | 英语 |
URL标识 | 查看原文 |
源URL | [http://ir.siat.ac.cn:8080/handle/172644/14158] ![]() |
专题 | 深圳先进技术研究院_数字所 |
推荐引用方式 GB/T 7714 | Zhibin Yu,Zhendong Bei,Xuehai Qian. Datasize-Aware High Dimensional ConfigurationsAuto-Tuning of In-Memory Cluster Computing[C]. 见:. Williamsburg, VA, USA. 2018. |
入库方式: OAI收割
来源:深圳先进技术研究院
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。