HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization
文献类型:期刊论文
作者 | Li, Zhikai1,2![]() ![]() ![]() ![]() |
刊名 | PATTERN RECOGNITION
![]() |
出版日期 | 2024-12-01 |
卷号 | 156页码:8 |
关键词 | Model compression Quantized neural networks Mixed-precision |
ISSN号 | 0031-3203 |
DOI | 10.1016/j.patcog.2024.110788 |
通讯作者 | Gu, Qingyi(qingyi.gu@ia.ac.cn) |
英文摘要 | Mixed-precision quantization, where more sensitive layers are kept at higher precision, can achieve the tradeoff between accuracy and complexity of neural networks. However, the search space for mixed-precision grows exponentially with the number of layers, making the brute force approach infeasible on deep networks. To reduce this exponential search space, recent efforts use Pareto frontier or integer linear programming to select the bit-precision of each layer. Unfortunately, we find that these prior works rely on a single constraint. In practice, model complexity includes space complexity and time complexity, and the two are weakly correlated, thus using simply one as a constraint leads to sub-optimal results. Besides this, they require manually set constraints, making them only pseudo-automatic. To address the above issues, we propose High-dimensional Trade-off Quantization (HTQ), which automatically determines the bit-precision in the high-dimensional space of model accuracy, space complexity, and time complexity without any manual intervention. Specifically, we use the saliency criterion based on connection sensitivity to indicate the accuracy perturbation after quantization, which performs similarly to Hessian information but can be calculated quickly (more than 1000x x speedup). The bit-precision is then automatically selected according to the three-dimensional (3D) Pareto frontier of the total perturbation, model size, and bit operations (BOPs) without manual constraints. Moreover, HTQ allows for the joint optimization of weights and activations, and thus the bit-precisions of both can be computed concurrently. Compared to state-of-the-art methods, HTQ achieves higher accuracy and lower space/time complexity on various model architectures for image classification and object detection tasks. Code is available at: https://github.com/zkkli/HTQ. |
WOS关键词 | BIT ALLOCATION |
资助项目 | National Natural Science Foundation of China[62276255] ; National Science and Technology Major Project[2022ZD0119402] |
WOS研究方向 | Computer Science ; Engineering |
语种 | 英语 |
WOS记录号 | WOS:001277107000001 |
出版者 | ELSEVIER SCI LTD |
资助机构 | National Natural Science Foundation of China ; National Science and Technology Major Project |
源URL | [http://ir.ia.ac.cn/handle/173211/59361] ![]() |
专题 | 精密感知与控制研究中心_精密感知与控制 |
通讯作者 | Gu, Qingyi |
作者单位 | 1.Chinese Acad Sci, Inst Automat, 95 East Zhongguancun Rd Haidian Dist, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Jingjia Rd, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Zhikai,Long, Xianlei,Xiao, Junrui,et al. HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization[J]. PATTERN RECOGNITION,2024,156:8. |
APA | Li, Zhikai,Long, Xianlei,Xiao, Junrui,&Gu, Qingyi.(2024).HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization.PATTERN RECOGNITION,156,8. |
MLA | Li, Zhikai,et al."HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization".PATTERN RECOGNITION 156(2024):8. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。