CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures
文献类型:期刊论文
作者 | Zou, Kaiwei1,2; Wang, Ying1,2; Cheng, Long3; Qu, Songyun2,4; Li, Huawei1,2,5; Li, Xiaowei1,2 |
刊名 | IEEE TRANSACTIONS ON COMPUTERS
![]() |
出版日期 | 2022-07-01 |
卷号 | 71期号:7页码:1626-1639 |
关键词 | Kernel Computer architecture Multicore processing Deep learning System-on-chip Parallel processing Real-time systems Neural networks parallel processing real-time and embedded systems single-chip multiprocessors reinforcement learning structured sparsity |
ISSN号 | 0018-9340 |
DOI | 10.1109/TC.2021.3099688 |
英文摘要 | Real-time inference of deep learning models on embedded and energy-efficient devices becomes increasingly desirable with the rapid growth of artificial intelligence on edge. Specifically, to achieve superb energy-efficiency and scalability, efficient parallelization of single-pass deep neural network (DNN) inference on chip multiprocessor (CMP) architectures is urgently required by many time-sensitive applications. However, as the number of processing cores scales up and the performance of cores has grown much fast, the on-chip inter-core data movement is prone to be a performance bottleneck for computation. To remedy this problem and further improve the performance of network inference, in this work, we introduce a communication-aware DNN parallelization technique called CAP, by exploiting the elasticity and noise-tolerance of deep learning algorithms on CMP. Moreover, in the hope that the conducted studies can provide new design values for real-time neural network inference on embedded chips, we also have evaluated the proposed approach on both multi-core Neural Network Accelerators (NNA) chips and general-purpose chip-multiprocessors. Our experimental results show that the proposed CAP can achieve 1.12x-1.65x system speedups and 1.14x-2.70x energy efficiency for different neural networks while maintaining the inference accuracy, compared to baseline approaches. |
资助项目 | National Key Research and Development Program of China[2020YFB1600201] ; National Natural Science Foundation of China (NSFC)[62090024] ; National Natural Science Foundation of China (NSFC)[61874124] ; National Natural Science Foundation of China (NSFC)[61876173] ; Fundamental Research Funds for the Central Universities[2021MS017] |
WOS研究方向 | Computer Science ; Engineering |
语种 | 英语 |
WOS记录号 | WOS:000808068000011 |
出版者 | IEEE COMPUTER SOC |
源URL | [http://119.78.100.204/handle/2XEOYT63/19590] ![]() |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Wang, Ying |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 3.North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 5.Peng Cheng Lab, Shenzhen 518066, Peoples R China |
推荐引用方式 GB/T 7714 | Zou, Kaiwei,Wang, Ying,Cheng, Long,et al. CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures[J]. IEEE TRANSACTIONS ON COMPUTERS,2022,71(7):1626-1639. |
APA | Zou, Kaiwei,Wang, Ying,Cheng, Long,Qu, Songyun,Li, Huawei,&Li, Xiaowei.(2022).CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures.IEEE TRANSACTIONS ON COMPUTERS,71(7),1626-1639. |
MLA | Zou, Kaiwei,et al."CAP: Communication-Aware Automated Parallelization for Deep Learning Inference on CMP Architectures".IEEE TRANSACTIONS ON COMPUTERS 71.7(2022):1626-1639. |
入库方式: OAI收割
来源:计算技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。