automatic fft performance tuning on opencl gpus
文献类型:会议论文
作者 | Li Yan ; Zhang Yunquan ; Jia Haipeng ; Long Guoping ; Wang Ke |
出版日期 | 2011 |
会议名称 | 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011 |
会议日期 | December 7, 2011 - December 9, 2011 |
会议地点 | Tainan, Taiwan |
关键词 | Algorithms Computer systems Discrete Fourier transforms Fast Fourier transforms Medical imaging |
页码 | 228-235 |
中文摘要 | School of Information Science and Engineering, Ocean University of China, Qingdao, China Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs. © 2011 IEEE. |
英文摘要 | School of Information Science and Engineering, Ocean University of China, Qingdao, China Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs. © 2011 IEEE. |
收录类别 | EI |
会议主办者 | National Cheng Kung University; National Science Council; Ministry of Education; Academia Sinica; National Center for High Performance Computing |
会议录 | Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
![]() |
语种 | 英语 |
ISSN号 | 1521-9097 |
ISBN号 | 9780769545769 |
源URL | [http://ir.iscas.ac.cn/handle/311060/16294] ![]() |
专题 | 软件研究所_软件所图书馆_会议论文 |
推荐引用方式 GB/T 7714 | Li Yan,Zhang Yunquan,Jia Haipeng,et al. automatic fft performance tuning on opencl gpus[C]. 见:2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011. Tainan, Taiwan. December 7, 2011 - December 9, 2011. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。