中国科学院机构知识库网格系统: Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence

Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence

文献类型：期刊论文


作者	Tao, Wei 2; Pan, Zhisong 2; Wu, Gaowei1 ; Tao, Qing 1,3
刊名	IEEE TRANSACTIONS ON CYBERNETICS
出版日期	2020-02-01
卷号	50 期号:2 页码:835-845
关键词	Convergence Convex functions Machine learning Optimization methods Linear programming Cybernetics Individual convergence machine learning mirror descent (MD) methods regularized learning problems stochastic gradient descent (SGD) stochastic optimization
ISSN号	2168-2267
DOI	10.1109/TCYB.2018.2874332
通讯作者	Tao, Qing(qing.tao@ia.ac.cn)
英文摘要	Many well-known first-order gradient methods have been extended to cope with large-scale composite problems, which often arise as a regularized empirical risk minimization in machine learning. However, their optimal convergence is attained only in terms of the weighted average of past iterative solutions. How to make the individual convergence of stochastic gradient descent (SGD) optimal, especially for strongly convex problems has now become a challenging problem in the machine learning community. On the other hand, Nesterov's recent weighted averaging strategy succeeds in achieving the optimal individual convergence of dual averaging (DA) but it fails in the basic mirror descent (MD). In this paper, a new primal averaging (PA) gradient operation step is presented, in which the gradient evaluation is imposed on the weighted average of all past iterative solutions. We prove that simply modifying the gradient operation step in MD by PA strategy suffices to recover the optimal individual rate for general convex problems. Along this line, the optimal individual rate of convergence for strongly convex problems can also be achieved by imposing the strong convexity on the gradient operation step. Furthermore, we extend PA-MD to solve regularized nonsmooth learning problems in the stochastic setting, which reveals that PA strategy is a simple yet effective extra step toward the optimal individual convergence of SGD. Several real experiments on sparse learning and SVM problems verify the correctness of our theoretical analysis.
WOS关键词	NEURAL-NETWORK ; OPTIMIZATION ; PERFORMANCE ; ALGORITHMS
资助项目	NSFC[61673394]
WOS研究方向	Automation & Control Systems ; Computer Science
语种	英语
WOS记录号	WOS:000506849800036
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	NSFC
源URL	[http://ir.ia.ac.cn/handle/173211/29521]
专题	精密感知与控制研究中心_人工智能与机器学习
通讯作者	Tao, Qing
作者单位	1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 2.Army Engn Univ PLA, Command & Control Engn Coll, Nanjing 210007, Peoples R China 3.Army Acad Artillery & Air Def, Dept Comp Sci, Hefei 230031, Peoples R China
推荐引用方式 GB/T 7714	Tao, Wei,Pan, Zhisong,Wu, Gaowei,et al. Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence[J]. IEEE TRANSACTIONS ON CYBERNETICS,2020,50(2):835-845.
APA	Tao, Wei,Pan, Zhisong,Wu, Gaowei,&Tao, Qing.(2020).Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence.IEEE TRANSACTIONS ON CYBERNETICS,50(2),835-845.
MLA	Tao, Wei,et al."Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence".IEEE TRANSACTIONS ON CYBERNETICS 50.2(2020):835-845.

入库方式： OAI收割

来源：自动化研究所

下载0

Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence

其他版本