Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
文献类型:会议论文
作者 | Weihan Chen![]() ![]() ![]() |
出版日期 | 2021-10 |
会议日期 | 2021-10-11 |
会议地点 | 线上举办 |
英文摘要 | Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation in the ultra-low precision regime and ignore the fact that emergent hardware accelerators begin to support mixed-precision computation. Consequently, we present a novel and principled framework to solve the mixed-precision quantization problem in this paper. Briefly speaking, we first formulate the mixed-precision quantization as a discrete constrained optimization problem. Then, to make the optimization tractable, we approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix. Finally, based on the above simplification, we show that the original problem can be reformulated as a Multiple-Choice Knapsack Problem (MCKP) and propose a greedy search algorithm to solve it efficiently. Compared with existing mixed-precision quantization works, our method is derived in a principled way and much more computationally efficient. Moreover, extensive experiments conducted on the |
源URL | [http://ir.ia.ac.cn/handle/173211/52065] ![]() |
专题 | 类脑芯片与系统研究 |
通讯作者 | Jian Cheng |
作者单位 | 1.NLPR & AIRIA, Institute of Automation, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Weihan Chen,Peisong Wang,Jian Cheng. Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization[C]. 见:. 线上举办. 2021-10-11. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。