中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures

文献类型:期刊论文

作者Zou, Kaiwei1,2; Wang, Ying1,2; Cheng, Long3; Qu, Songyun2,4; Li, Huawei1,2,5; Li, Xiaowei1,2
刊名IEEE TRANSACTIONS ON COMPUTERS
出版日期2022-07-01
卷号71期号:7页码:1626-1639
关键词Kernel Computer architecture Multicore processing Deep learning System-on-chip Parallel processing Real-time systems Neural networks parallel processing real-time and embedded systems single-chip multiprocessors reinforcement learning structured sparsity
ISSN号0018-9340
DOI10.1109/TC.2021.3099688
英文摘要Real-time inference of deep learning models on embedded and energy-efficient devices becomes increasingly desirable with the rapid growth of artificial intelligence on edge. Specifically, to achieve superb energy-efficiency and scalability, efficient parallelization of single-pass deep neural network (DNN) inference on chip multiprocessor (CMP) architectures is urgently required by many time-sensitive applications. However, as the number of processing cores scales up and the performance of cores has grown much fast, the on-chip inter-core data movement is prone to be a performance bottleneck for computation. To remedy this problem and further improve the performance of network inference, in this work, we introduce a communication-aware DNN parallelization technique called CAP, by exploiting the elasticity and noise-tolerance of deep learning algorithms on CMP. Moreover, in the hope that the conducted studies can provide new design values for real-time neural network inference on embedded chips, we also have evaluated the proposed approach on both multi-core Neural Network Accelerators (NNA) chips and general-purpose chip-multiprocessors. Our experimental results show that the proposed CAP can achieve 1.12x-1.65x system speedups and 1.14x-2.70x energy efficiency for different neural networks while maintaining the inference accuracy, compared to baseline approaches.
资助项目National Key Research and Development Program of China[2020YFB1600201] ; National Natural Science Foundation of China (NSFC)[62090024] ; National Natural Science Foundation of China (NSFC)[61874124] ; National Natural Science Foundation of China (NSFC)[61876173] ; Fundamental Research Funds for the Central Universities[2021MS017]
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:000808068000011
出版者IEEE COMPUTER SOC
源URL[http://119.78.100.204/handle/2XEOYT63/19590]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Wang, Ying
作者单位1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
3.North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
5.Peng Cheng Lab, Shenzhen 518066, Peoples R China
推荐引用方式
GB/T 7714
Zou, Kaiwei,Wang, Ying,Cheng, Long,et al. CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures[J]. IEEE TRANSACTIONS ON COMPUTERS,2022,71(7):1626-1639.
APA Zou, Kaiwei,Wang, Ying,Cheng, Long,Qu, Songyun,Li, Huawei,&Li, Xiaowei.(2022).CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures.IEEE TRANSACTIONS ON COMPUTERS,71(7),1626-1639.
MLA Zou, Kaiwei,et al."CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures".IEEE TRANSACTIONS ON COMPUTERS 71.7(2022):1626-1639.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。