中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs

文献类型:期刊论文

作者Wang, Xueying1; Li, Shigang1; Qian, Hao2; Luo, Fan3,4; Hao, Zhaoyang3,4; Wu, Tong1; Xu, Ruiyuan3,4; Cui, Huimin3,4; Feng, Xiaobing3,4; Li, Guangli2,3,4
刊名ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
出版日期2025-06-01
卷号22期号:2页码:27
关键词Deep learning systems convolutional neural networks operator fusion
ISSN号1544-3566
DOI10.1145/3716876
英文摘要Convolutional Neural Networks (CNNs) are fundamental to advancing computer vision technologies. As CNNs become more complex and larger, optimizing model inference remains a critical challenge in both industry and academia. On modern GPU platforms, CNN operators are typically memory-bound, leading to significant performance degradation due to memory wall effects. While recent advancements have utilized operator fusion-merging multiple operators into one-to enhance inference performance, the fusion of multiple region-based operators like convolution is seldom addressed. This article introduces AFusioN, a novel operator fusion technique aimed at improving inference performance, and OptiFX, an automatic optimization framework based on this approach. OptiFX employs a cost-based backtracking search to identify optimal sub-graphs for fusion and utilizes template-based code generation to create efficient kernels for these fused sub-graphs. We evaluate OptiFX across seven prominent CNN architectures-GoogLeNet, ResNet, DenseNet, MobileNet, SqueezeNet, NasNet, and UNet-on Nvidia A6000 Ada, RTX 4090, and Jetson AGX Orin platforms. Our results demonstrate that OptiFX significantly outperforms existing methods, achieving average speedups of 2.91x, 3.30x, and 2.09x in accelerating inference performance on these platforms, respectively.
资助项目National Science and Technology Major Project[2023ZD0120502] ; National Natural Science Foundation of China[62302479] ; National Natural Science Foundation of China[62232015] ; National Natural Science Foundation of China[62090024] ; National Natural Science Foundation of China[62372055] ; Fund of Laboratory for Advanced Computing and Intelligence Engineering, the China Postdoctoral Science Foundation[2024M750258] ; Fund of Laboratory for Advanced Computing and Intelligence Engineering, the China Postdoctoral Science Foundation[2023M733566] ; CCF-Tencent Rhino-Bird Open Research Fund ; State Key Lab of Processors, Institute of Computing Technology, CAS[CLQ202411] ; Innovation Funding of ICT, CAS[E361010] ; Innovation Funding of ICT, CAS[E261110] ; Australian Research Council (ARC) Grant[DP250104934]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:001532815500004
出版者ASSOC COMPUTING MACHINERY
源URL[http://119.78.100.204/handle/2XEOYT63/42096]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Li, Shigang; Li, Guangli
作者单位1.Beijing Univ Posts & Telecommun, Beijing, Peoples R China
2.Univ New South Wales, Sydney, NSW, Australia
3.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
4.Univ Chinese Acad Sci, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Wang, Xueying,Li, Shigang,Qian, Hao,et al. OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2025,22(2):27.
APA Wang, Xueying.,Li, Shigang.,Qian, Hao.,Luo, Fan.,Hao, Zhaoyang.,...&Li, Guangli.(2025).OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,22(2),27.
MLA Wang, Xueying,et al."OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 22.2(2025):27.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。