中国科学院机构知识库网格系统: HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization

HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization

文献类型：期刊论文


作者	Li, Zhikai1,2 ; Long, Xianlei1,2 ; Xiao, Junrui1,2 ; Gu, Qingyi1
刊名	PATTERN RECOGNITION
出版日期	2024-12-01
卷号	156 页码:8
关键词	Model compression Quantized neural networks Mixed-precision
ISSN号	0031-3203
DOI	10.1016/j.patcog.2024.110788
通讯作者	Gu, Qingyi(qingyi.gu@ia.ac.cn)
英文摘要	Mixed-precision quantization, where more sensitive layers are kept at higher precision, can achieve the tradeoff between accuracy and complexity of neural networks. However, the search space for mixed-precision grows exponentially with the number of layers, making the brute force approach infeasible on deep networks. To reduce this exponential search space, recent efforts use Pareto frontier or integer linear programming to select the bit-precision of each layer. Unfortunately, we find that these prior works rely on a single constraint. In practice, model complexity includes space complexity and time complexity, and the two are weakly correlated, thus using simply one as a constraint leads to sub-optimal results. Besides this, they require manually set constraints, making them only pseudo-automatic. To address the above issues, we propose High-dimensional Trade-off Quantization (HTQ), which automatically determines the bit-precision in the high-dimensional space of model accuracy, space complexity, and time complexity without any manual intervention. Specifically, we use the saliency criterion based on connection sensitivity to indicate the accuracy perturbation after quantization, which performs similarly to Hessian information but can be calculated quickly (more than 1000x x speedup). The bit-precision is then automatically selected according to the three-dimensional (3D) Pareto frontier of the total perturbation, model size, and bit operations (BOPs) without manual constraints. Moreover, HTQ allows for the joint optimization of weights and activations, and thus the bit-precisions of both can be computed concurrently. Compared to state-of-the-art methods, HTQ achieves higher accuracy and lower space/time complexity on various model architectures for image classification and object detection tasks. Code is available at: https://github.com/zkkli/HTQ.
WOS关键词	BIT ALLOCATION
资助项目	National Natural Science Foundation of China[62276255] ; National Science and Technology Major Project[2022ZD0119402]
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001277107000001
出版者	ELSEVIER SCI LTD
资助机构	National Natural Science Foundation of China ; National Science and Technology Major Project
源URL	[http://ir.ia.ac.cn/handle/173211/59361]
专题	精密感知与控制研究中心_精密感知与控制
通讯作者	Gu, Qingyi
作者单位	1.Chinese Acad Sci, Inst Automat, 95 East Zhongguancun Rd Haidian Dist, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Jingjia Rd, Beijing, Peoples R China
推荐引用方式 GB/T 7714	Li, Zhikai,Long, Xianlei,Xiao, Junrui,et al. HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization[J]. PATTERN RECOGNITION,2024,156:8.
APA	Li, Zhikai,Long, Xianlei,Xiao, Junrui,&Gu, Qingyi.(2024).HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization.PATTERN RECOGNITION,156,8.
MLA	Li, Zhikai,et al."HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization".PATTERN RECOGNITION 156(2024):8.

入库方式： OAI收割

来源：自动化研究所

下载0

HTQ: Exploring the High-Dimensional Trade-Off of mixed-precision quantization

其他版本