中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Towards Fully Sparse Training: Information Restoration with Spatial Similarity

文献类型:会议论文

作者Xu WX(许伟翔)1,2; Wang PS(王培松)1,2; Cheng J(程健)1,2
出版日期2021-11
会议日期2022-04
会议地点Vancouver, British Columbia, Canada
英文摘要

The 2:4 structured sparsity pattern released by NVIDIA Ampere architecture, requiring four consecutive values containing at least two zeros, enables doubling math throughput for matrix multiplications. Recent works mainly focus on inference speedup via 2:4 sparsity while training acceleration has been largely overwhelmed where backpropagation consumes around 70\% of the training time. However, unlike inference, training speedup with structured pruning is nontrivial due to the need to maintain the fidelity of gradients and reduce the additional overhead of performing 2:4 sparsity online. For the first time, this article proposes fully sparse training (FST) where `fully' indicates that ALL matrix multiplications in forward/backward propagation are structurally pruned while maintaining accuracy. To this end, we begin with saliency analysis, investigating the sensitivity of different sparse objects to structured pruning. Based on the observation of spatial similarity among activations, we propose pruning activations with fixed 2:4 masks. Moreover, an Information Restoration block is proposed to retrieve the lost information, which can be implemented by efficient gradient-shift operation. Evaluation of accuracy and efficiency shows that we can achieve 2$\times$ training acceleration with negligible accuracy degradation on challenging large-scale classification and detection tasks.

源URL[http://ir.ia.ac.cn/handle/173211/52074]  
专题类脑芯片与系统研究
作者单位1.中国科学院自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
Xu WX,Wang PS,Cheng J. Towards Fully Sparse Training: Information Restoration with Spatial Similarity[C]. 见:. Vancouver, British Columbia, Canada. 2022-04.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。