中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization

文献类型:期刊论文

作者Li, Zhikai1,2; Long, Xianlei1,2; Xiao, Junrui1,2; Gu, Qingyi1
刊名PATTERN RECOGNITION
出版日期2024-12-01
卷号156页码:8
关键词Model compression Quantized neural networks Mixed-precision
ISSN号0031-3203
DOI10.1016/j.patcog.2024.110788
通讯作者Gu, Qingyi(qingyi.gu@ia.ac.cn)
英文摘要Mixed-precision quantization, where more sensitive layers are kept at higher precision, can achieve the tradeoff between accuracy and complexity of neural networks. However, the search space for mixed-precision grows exponentially with the number of layers, making the brute force approach infeasible on deep networks. To reduce this exponential search space, recent efforts use Pareto frontier or integer linear programming to select the bit-precision of each layer. Unfortunately, we find that these prior works rely on a single constraint. In practice, model complexity includes space complexity and time complexity, and the two are weakly correlated, thus using simply one as a constraint leads to sub-optimal results. Besides this, they require manually set constraints, making them only pseudo-automatic. To address the above issues, we propose High-dimensional Trade-off Quantization (HTQ), which automatically determines the bit-precision in the high-dimensional space of model accuracy, space complexity, and time complexity without any manual intervention. Specifically, we use the saliency criterion based on connection sensitivity to indicate the accuracy perturbation after quantization, which performs similarly to Hessian information but can be calculated quickly (more than 1000x x speedup). The bit-precision is then automatically selected according to the three-dimensional (3D) Pareto frontier of the total perturbation, model size, and bit operations (BOPs) without manual constraints. Moreover, HTQ allows for the joint optimization of weights and activations, and thus the bit-precisions of both can be computed concurrently. Compared to state-of-the-art methods, HTQ achieves higher accuracy and lower space/time complexity on various model architectures for image classification and object detection tasks. Code is available at: https://github.com/zkkli/HTQ.
WOS关键词BIT ALLOCATION
资助项目National Natural Science Foundation of China[62276255] ; National Science and Technology Major Project[2022ZD0119402]
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:001277107000001
出版者ELSEVIER SCI LTD
资助机构National Natural Science Foundation of China ; National Science and Technology Major Project
源URL[http://ir.ia.ac.cn/handle/173211/59361]  
专题精密感知与控制研究中心_精密感知与控制
通讯作者Gu, Qingyi
作者单位1.Chinese Acad Sci, Inst Automat, 95 East Zhongguancun Rd Haidian Dist, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Jingjia Rd, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Li, Zhikai,Long, Xianlei,Xiao, Junrui,et al. HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization[J]. PATTERN RECOGNITION,2024,156:8.
APA Li, Zhikai,Long, Xianlei,Xiao, Junrui,&Gu, Qingyi.(2024).HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization.PATTERN RECOGNITION,156,8.
MLA Li, Zhikai,et al."HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization".PATTERN RECOGNITION 156(2024):8.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。