Improving Utilization of Dataflow Unit for Multi-Batch Processing
文献类型:期刊论文
作者 | Fan, Zhihua2; Li, Wenming; Wang, Zhen; Yang, Yu; Ye, Xiaochun; Fan, Dongrui; Sun, Ninghui; An, Xuejun |
刊名 | ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
![]() |
出版日期 | 2024-03-01 |
卷号 | 21期号:1页码:26 |
关键词 | Utilization network-on-chip decoupled architecture batch processing |
ISSN号 | 1544-3566 |
DOI | 10.1145/3637906 |
英文摘要 | Dataflow architectures can achieve much better performance and higher efficiency than general-purpose core, approaching the performance of a specialized design while retaining programmability. However, advanced application scenarios place higher demands on the hardware in terms of cross-domain and multi-batch processing. In this article, we propose a unified scale-vector architecture that can work in multiple modes and adapt to diverse algorithms and requirements efficiently. First, a novel reconfigurable interconnection structure is proposed, which can organize execution units into different cluster typologies as a way to accommodate different data-level parallelism. Second, we decouple threads within each DFG node into consecutive pipeline stages and provide architectural support. By time-multiplexing during these stages, dataflow hardware can achieve much higher utilization and performance. In addition, the task-based program model can also exploit multi-level parallelism and deploy applications efficiently. Evaluated in a wide range of benchmarks, including digital signal processing algorithms, CNNs, and scientific computing algorithms, our design attains up to 11.95x energy efficiency (performance-per-watt) improvement over GPU (V100), and 2.01x energy efficiency improvement over state-of-the-art dataflow architectures. |
资助项目 | National Key R&D Program of China[2022YFB4501404] ; Beijing Nova Program[20230484420] ; Beijing Nova Program[20220484054] ; CAS Project for Young Scientists in Basic Research[YSBR-029] ; CAS Project for Youth Innovation Promotion Association ; Open Research Projects of Zhejiang Lab[2022PB0AB01] |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001193465400017 |
出版者 | ASSOC COMPUTING MACHINERY |
源URL | [http://119.78.100.204/handle/2XEOYT63/38772] ![]() |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Fan, Zhihua |
作者单位 | 1.Univ Chinese Acad Sci, 19A Yuquan Rd, Beijing, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, 6 South Sci Acad Rd, Beijing, Peoples R China 3.Univ Chinese Acad Sci, Beijing, Peoples R China 4.Chinese Acad Sci, State Key Lab Processors Inst Comp Technol, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Fan, Zhihua,Li, Wenming,Wang, Zhen,et al. Improving Utilization of Dataflow Unit for Multi-Batch Processing[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2024,21(1):26. |
APA | Fan, Zhihua.,Li, Wenming.,Wang, Zhen.,Yang, Yu.,Ye, Xiaochun.,...&An, Xuejun.(2024).Improving Utilization of Dataflow Unit for Multi-Batch Processing.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,21(1),26. |
MLA | Fan, Zhihua,et al."Improving Utilization of Dataflow Unit for Multi-Batch Processing".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 21.1(2024):26. |
入库方式: OAI收割
来源:计算技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。