中国科学院机构知识库网格系统: Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

文献类型：期刊论文


作者	Li, Gang2,3 ; Liu, Zejian 3,4; Li, Fanrong3,4 ; Cheng, Jian1,3,4
刊名	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
出版日期	2021-05-21
期号	2021.5 页码:1-1
关键词	block convolution memory-efficient off-chip transfer fpga cnn accelerator
英文摘要	Deep convolutional neural networks have achieved remarkable progress in recent years. However, the large vol ume of intermediate results generated during inference poses a signifificant challenge to the accelerator design for resource constraint FPGA. Due to the limited on-chip storage, partial results of intermediate layers are frequently transferred back and forth between on-chip memory and off-chip DRAM, leading to a non-negligible increase in latency and energy consumption. In this paper, we propose block convolution, a hardware-friendly, simple, yet effificient convolution operation that can completely avoid the off-chip transfer of intermediate feature maps at run time. The fundamental idea of block convolution is to eliminate the dependency of feature map tiles in the spatial dimension when spatial tiling is used, which is realized by splitting a feature map into independent blocks so that convolution can be performed separately on individual blocks. We conduct extensive experiments to demonstrate the effificacy of the proposed block convolution on both the algorithm side and the hardware side. Specififically, we evaluate block convolution on 1) VGG-16, ResNet- 18, ResNet-50, and MobileNet-V1 for ImageNet classifification task; 2) SSD, FPN for COCO object detection task, and 3) VDSR for Set5 single image super-resolution task. Experimental results demonstrate that comparable or higher accuracy can be achieved with block convolution. We also showcase two CNN accelerators via algorithm/hardware co-design based on block convolution on memory-limited FPGAs, and evaluation shows that both accelerators substantially outperform the baseline without off chip transfer of intermediate feature maps.
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/47034]
专题	类脑芯片与系统研究
通讯作者	Cheng, Jian
作者单位	1.Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences 4.School of Future Technology, University of Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Li, Gang,Liu, Zejian,Li, Fanrong,et al. Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2021(2021.5):1-1.
APA	Li, Gang,Liu, Zejian,Li, Fanrong,&Cheng, Jian.(2021).Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2021.5),1-1.
MLA	Li, Gang,et al."Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA".IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems .2021.5(2021):1-1.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。