Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs
文献类型:期刊论文
作者 | Ma, WJ ; Gao, K ; Long, GP |
刊名 | JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
![]() |
出版日期 | 2016 |
卷号 | 31期号:6页码:1262-1274 |
关键词 | GPGPU OpenCL stencil code generation computation reuse |
ISSN号 | 1000-9000 |
中文摘要 | Computation reuse is known as an effective optimization technique. However, due to the complexity of modern GPU architectures, there is yet not enough understanding regarding the intriguing implications of the interplay of computation reuse and hardware specifics on application performance. In this paper, we propose an automatic code generator for a class of stencil codes with inherent computation reuse on GPUs. For such applications, the proper reuse of intermediate results, combined with careful register and on-chip local memory usage, has profound implications on performance. Current state of the art does not address this problem in depth, partially due to the lack of a good program representation that can expose all potential computation reuse. In this paper, we leverage the computation overlap graph (COG), a simple representation of data dependence and data reuse with "element view", to expose potential reuse opportunities. Using COG, we propose a portable code generation and tuning framework for GPUs. Compared with current state-of-the-art code generators, our experimental results show up to 56.7% performance improvement on modern GPUs such as NVIDIA C2050. |
英文摘要 | Computation reuse is known as an effective optimization technique. However, due to the complexity of modern GPU architectures, there is yet not enough understanding regarding the intriguing implications of the interplay of computation reuse and hardware specifics on application performance. In this paper, we propose an automatic code generator for a class of stencil codes with inherent computation reuse on GPUs. For such applications, the proper reuse of intermediate results, combined with careful register and on-chip local memory usage, has profound implications on performance. Current state of the art does not address this problem in depth, partially due to the lack of a good program representation that can expose all potential computation reuse. In this paper, we leverage the computation overlap graph (COG), a simple representation of data dependence and data reuse with "element view", to expose potential reuse opportunities. Using COG, we propose a portable code generation and tuning framework for GPUs. Compared with current state-of-the-art code generators, our experimental results show up to 56.7% performance improvement on modern GPUs such as NVIDIA C2050. |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | WOS:000387335600015 |
公开日期 | 2016-12-09 |
源URL | [http://ir.iscas.ac.cn/handle/311060/17294] ![]() |
专题 | 软件研究所_软件所图书馆_期刊论文 |
推荐引用方式 GB/T 7714 | Ma, WJ,Gao, K,Long, GP. Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2016,31(6):1262-1274. |
APA | Ma, WJ,Gao, K,&Long, GP.(2016).Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,31(6),1262-1274. |
MLA | Ma, WJ,et al."Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31.6(2016):1262-1274. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。