基于GPU的三维块匹配去噪并行算法研究
文献类型:学位论文
作者 | 袁龙杰 |
学位类别 | 工程硕士 |
答辩日期 | 2014-05-21 |
授予单位 | 中国科学院大学 |
授予地点 | 中国科学院自动化研究所 |
导师 | 彭思龙 |
关键词 | 图像处理 BM3D 图像处理器 统一计算设备架构 并行计算 image processing Block matching and 3D filtering Graphics Processing Units Compute Unified Device Architecture Parallel Computing |
其他题名 | A Study of Block Matching and 3D Filtering Parallel Algorithm Based on GPU |
学位专业 | 控制工程 |
中文摘要 | BM3D算法是目前去噪效果最好的算法,算法不仅利用了图像中图像块内部像素点的空间关系,还利用了图像块之间的相似信息对图像进行处理。因此算法不仅能够较好地保留图像细节,还能较少地引入假信号。但是由于BM3D算法计算复杂度非常高,计算量非常大,对于很多实时应用过于复杂,并不能满足实时计算的要求。 本文采用图形处理器(GPU)强大的数值计算能力及并行计算的特点,将BM3D算法并行化,以期缩短算法执行时间,并控制在毫秒量级,满足实时处理的要求。本文的主要工作包括以下三个方面: (1) 探究GPU并行计算的工作特点以及CUDA编程的要点。在利用CUDA实现并行算法时充分考虑GPU硬件、存储器等资源的限制,以保证达到并行的最优效果。 (2) 结合BM3D算法的特点,分析算法流程,采用近似的方法将算法第二阶段中块匹配步骤省略,用第一阶段算法块匹配的结果取而代之。将算法分成功能相对独立的处理模块,并改进各模块的算法策略和个别模块的处理顺序,以减少串行执行的计算量,适合并行计算的实现。 (3) 在BM3D基础上,基于CUDA编程模型,并行实现算法CUDA_BM3D。对CUDA_BM3D每个并行模块从并行度、存储器和指令三个方面的算法优化,以提高算法的计算效率,缩短算法的执行时间。 实验结果表明,CUDA_BM3D算法虽然相比于BM3D算法有块匹配步骤的近似,图像峰值信噪比(PSNR)有略微下降,但仍旧优于3DDCT及维纳滤波算法。CUDA_BM3D算法比串行BM3D算法有至少36倍的提速,并且在执行352×288尺寸大小的图像时能够满足实时处理的要求。 使用GPU加速BM3D算法,可以满足该算法在图像处理中实时化的需求。还可以将GPU并行计算思想和优化策略应用于复杂图像算法,为复杂算法的实时化提供了可能。 |
英文摘要 | BM3D algorithm is the best image denoising method at present. The algorithm not only makes use of the spatial relationship of internal pixel of the image block, but also the use of the information of the image similarity between blocks for image processing. Therefore the algorithm can better preserve the image details and introduce less artifacts. But the computational complexity of BM3D algorithm is very high, the calculation is very large. For many real-time applications are too complex to meet the requirement of real-time computing. this study used the powerful numerical computation ability of Graphics Processing Units and the characteristics of the parallel computing, paralleled BM3D algorithm with the aim to reduce the algorithm execution time and meet the requirement of real-time processing. The major content include the following three aspects. (1) Explore the working characteristics of the GPU parallel computing and parallel programming. When using CUDA parallel algorithm consider limiting GPU hardware, memory and other resources in order to guarantee the achievement of the best results in parallel. (2) According to the characteristics of the BM3D algorithm, analysis algorithm flow, omit block matching procedure in the second step by the method of approximation, use the result of the first step to instead of it. Divide the algorithm into relatively independent functional processing modules, and modify the module algorithm strategy and individual module processing order in order to reduce the amount of serial algorithm computation and suitable for parallel computing. (3) Parallel algorithm based on BM3D algorithm and CUDA programming model. Parallelism of CUDA parallel modules each memory and instruction from three aspects of algorithm optimization, in order to improve the computational efficiency, shorten the execution time of the algorithm. Experimental results show that the PSNR of CUDA_BM3D has slightly decreased than BM3D algorithm, but still better than 3DDCT and Wiener filtering algorithm. The speedup ratio is at least 36, and when performing the image size of 352×288 can meet the requirement of real-time processing. Using the GPU acceleration BM3D algorithm, the proposed algorithm can satisfy the demand of real-time in image processing area. GPU parallel computing ideas and optimization strategy can also be applied to the complex image algorithm, offer thepossibility of real-time complex algorithm. |
语种 | 中文 |
其他标识符 | 2011E8014661090 |
源URL | [http://ir.ia.ac.cn/handle/173211/7717] ![]() |
专题 | 毕业生_硕士学位论文 |
推荐引用方式 GB/T 7714 | 袁龙杰. 基于GPU的三维块匹配去噪并行算法研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2014. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。