Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs
文献类型:期刊论文
作者 | Wang, Xueying4,8; Li, Guangli2,3,6,7; Jia, Zhen1,5; Feng, Xiaobing2,3,6,7; Wang, Yida1,5 |
刊名 | ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
![]() |
出版日期 | 2024-03-01 |
卷号 | 21期号:1页码:26 |
关键词 | Deep learning winograd convolution low-precision computation |
ISSN号 | 1544-3566 |
DOI | 10.1145/3632956 |
英文摘要 | Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this article, we propose an effective quantizedWinograd convolution, named LoWino, which employs an in-side quantization method in theWinograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate LoWino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84x and 1.91x operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level. |
资助项目 | National Key R&D Program of China[2021ZD0110101] ; National Natural Science Foundation of China[62090024] ; National Natural Science Foundation of China[62232015] ; National Natural Science Foundation of China[62302479] ; China Postdoctoral Science Foundation[2023M733566] ; Innovation Funding of ICT, CAS[E361010] |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001193465400005 |
出版者 | ASSOC COMPUTING MACHINERY |
源URL | [http://119.78.100.204/handle/2XEOYT63/38764] ![]() |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Li, Guangli |
作者单位 | 1.AmazonWeb Serv, 2795 Augustine Dr, Santa Clara, CA 95054 USA 2.Univ Chinese Acad Sci, 6 Kexueyuan South Rd, Beijing 100190, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, 6 Kexueyuan South Rd, Beijing 100190, Peoples R China 4.Beijing Univ Posts & Telecommun, 10 Xitucheng Rd, Beijing 100876, Peoples R China 5.Amazon Web Serv, Seattle, WA USA 6.Univ Chinese Acad Sci, Beijing, Peoples R China 7.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 8.Beijing Univ Posts & Telecommun, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Wang, Xueying,Li, Guangli,Jia, Zhen,et al. Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2024,21(1):26. |
APA | Wang, Xueying,Li, Guangli,Jia, Zhen,Feng, Xiaobing,&Wang, Yida.(2024).Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,21(1),26. |
MLA | Wang, Xueying,et al."Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 21.1(2024):26. |
入库方式: OAI收割
来源:计算技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。