Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA
文献类型:期刊论文
作者 | Li, Gang2,3![]() ![]() ![]() |
刊名 | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
![]() |
出版日期 | 2021-05-21 |
期号 | 2021.5页码:1-1 |
关键词 | block convolution memory-efficient off-chip transfer fpga cnn accelerator |
英文摘要 | Deep convolutional neural networks have achieved
remarkable progress in recent years. However, the large vol
ume of intermediate results generated during inference poses
a signifificant challenge to the accelerator design for resource
constraint FPGA. Due to the limited on-chip storage, partial
results of intermediate layers are frequently transferred back and
forth between on-chip memory and off-chip DRAM, leading to
a non-negligible increase in latency and energy consumption. In
this paper, we propose block convolution, a hardware-friendly,
simple, yet effificient convolution operation that can completely
avoid the off-chip transfer of intermediate feature maps at run
time. The fundamental idea of block convolution is to eliminate
the dependency of feature map tiles in the spatial dimension
when spatial tiling is used, which is realized by splitting a
feature map into independent blocks so that convolution can be
performed separately on individual blocks. We conduct extensive
experiments to demonstrate the effificacy of the proposed block
convolution on both the algorithm side and the hardware side.
Specififically, we evaluate block convolution on 1) VGG-16, ResNet-
18, ResNet-50, and MobileNet-V1 for ImageNet classifification
task; 2) SSD, FPN for COCO object detection task, and 3) VDSR
for Set5 single image super-resolution task. Experimental results
demonstrate that comparable or higher accuracy can be achieved
with block convolution. We also showcase two CNN accelerators
via algorithm/hardware co-design based on block convolution
on memory-limited FPGAs, and evaluation shows that both
accelerators substantially outperform the baseline without off
chip transfer of intermediate feature maps. |
语种 | 英语 |
源URL | [http://ir.ia.ac.cn/handle/173211/47034] ![]() |
专题 | 类脑芯片与系统研究 |
通讯作者 | Cheng, Jian |
作者单位 | 1.Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences 4.School of Future Technology, University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Li, Gang,Liu, Zejian,Li, Fanrong,et al. Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2021(2021.5):1-1. |
APA | Li, Gang,Liu, Zejian,Li, Fanrong,&Cheng, Jian.(2021).Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2021.5),1-1. |
MLA | Li, Gang,et al."Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA".IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems .2021.5(2021):1-1. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。