Bayer image parallel decoding based on GPU
文献类型:会议论文
作者 | Hu, Rihui1,2; Xu, Zhiyong1; Wei, Yuxing1; Sun, Shaohua3 |
出版日期 | 2012 |
会议名称 | Proceedings of SPIE: Optoelectronic Imaging and Multimedia Technology II |
会议日期 | 2012 |
卷号 | 8558 |
页码 | 85581T |
通讯作者 | Hu, R. (ustc_hui@126.com) |
中文摘要 | In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2Kx2Kx16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1Kx1Kx16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method. © Copyright SPIE. |
英文摘要 | In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2Kx2Kx16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1Kx1Kx16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method. © Copyright SPIE. |
收录类别 | EI |
语种 | 英语 |
ISSN号 | 0277786X |
源URL | [http://ir.ioe.ac.cn/handle/181551/7695] ![]() |
专题 | 光电技术研究所_光电探测与信号处理研究室(五室) |
作者单位 | 1.Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, Sichuan, 610209, China 2.Graduate School, Chinese Academy of Sciences, Beijing, 100039, China 3.750 Test Field of China Shipbuilding Industry Corporation, Kunming, Yunnan, 650051, China |
推荐引用方式 GB/T 7714 | Hu, Rihui,Xu, Zhiyong,Wei, Yuxing,et al. Bayer image parallel decoding based on GPU[C]. 见:Proceedings of SPIE: Optoelectronic Imaging and Multimedia Technology II. 2012. |
入库方式: OAI收割
来源:光电技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。