中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization

文献类型:期刊论文

作者Di, Zhanyuan2,3; Wang, Leping2; Ma, Zhaojia2,3; Shao, En3; Zhao, Jie1; Ren, Ziyi2; Feng, Siyuan4; Tao, Dingwen2; Tan, Guangming2; Sun, Ninghui2
刊名ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
出版日期2025-09-01
卷号22期号:3页码:26
关键词Deep learning tensor compiler inference optimization code generation GPU
ISSN号1544-3566
DOI10.1145/3744906
英文摘要Parallel structures have become a key pattern in deep neural networks (DNNs), offering improved efficiency and scalability. However, existing machine learning compilers (MLCs) face challenges in optimizing these structures due to limited parallel fusion scope and insufficient analysis of intra-operator characteristics. This article introduces Magneto, a framework designed to accelerate DNN inference by co-optimizing parallel operators. Magneto broadens the fusion scope and incorporates a specialized co-tuning algorithm to optimize operators jointly. Our approach addresses the unique challenges inherent in optimizing parallel structures, enabling significant performance improvements across various hardware platforms. Experimental results show that Magneto outperforms state-of-the-art NVIDIA TensorRT and AMD MIGraphX, achieving geometric mean speedups of 2.27x and 2.88x, respectively.
资助项目NKRDP[2021YFB0300202] ; NSFC[62032023] ; NSFC[T2125013] ; NSFC[T2422007] ; NSFC[62225205] ; NSFC[U24A20235] ; Youth Innovation Promotion Association of CAS[2021099] ; Innovation Funding of ICT, CAS[E461030] ; Tianjin Science and Technology Plan Project[24ZXKJGX00060]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:001606025500010
出版者ASSOC COMPUTING MACHINERY
源URL[http://119.78.100.204/handle/2XEOYT63/41582]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Shao, En; Tan, Guangming
作者单位1.Hunan Univ, Changsha, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China
3.Univ Chinese Acad Sci, Beijing, Peoples R China
4.Shanghai Jiao Tong Univ, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Di, Zhanyuan,Wang, Leping,Ma, Zhaojia,et al. Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2025,22(3):26.
APA Di, Zhanyuan.,Wang, Leping.,Ma, Zhaojia.,Shao, En.,Zhao, Jie.,...&Sun, Ninghui.(2025).Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,22(3),26.
MLA Di, Zhanyuan,et al."Accelerating Parallel Structures in DNNs via Parallel Fusion and Operator Co-Optimization".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 22.3(2025):26.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。