中国科学院机构知识库网格系统: 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores

文献类型：期刊论文


作者	Liu, YQ ; Yang, C ; Liu, FF ; Zhang, XY ; Lu, YT ; Du, YF ; Yang, CQ ; Xie, M ; Liao, XK
刊名	INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
出版日期	2016
卷号	30 期号:1 页码:39-54
关键词	Tianhe-2 HPCG conjugate gradients MIC heterogeneous computing
ISSN号	1094-3420
中文摘要	In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner-outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world's largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%.
英文摘要	In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner-outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world's largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%.
收录类别	SCI
语种	英语
WOS记录号	WOS:000371326000004
公开日期	2016-12-09
源URL	[http://ir.iscas.ac.cn/handle/311060/17346]
专题	软件研究所_软件所图书馆_期刊论文
推荐引用方式 GB/T 7714	Liu, YQ,Yang, C,Liu, FF,et al. 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS,2016,30(1):39-54.
APA	Liu, YQ.,Yang, C.,Liu, FF.,Zhang, XY.,Lu, YT.,...&Liao, XK.(2016).623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores.INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS,30(1),39-54.
MLA	Liu, YQ,et al."623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores".INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 30.1(2016):39-54.

入库方式： OAI收割

来源：软件研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。