中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence

文献类型:期刊论文

作者Tao, Wei2; Pan, Zhisong2; Wu, Gaowei1; Tao, Qing1,3
刊名IEEE TRANSACTIONS ON CYBERNETICS
出版日期2020-02-01
卷号50期号:2页码:835-845
关键词Convergence Convex functions Machine learning Optimization methods Linear programming Cybernetics Individual convergence machine learning mirror descent (MD) methods regularized learning problems stochastic gradient descent (SGD) stochastic optimization
ISSN号2168-2267
DOI10.1109/TCYB.2018.2874332
通讯作者Tao, Qing(qing.tao@ia.ac.cn)
英文摘要Many well-known first-order gradient methods have been extended to cope with large-scale composite problems, which often arise as a regularized empirical risk minimization in machine learning. However, their optimal convergence is attained only in terms of the weighted average of past iterative solutions. How to make the individual convergence of stochastic gradient descent (SGD) optimal, especially for strongly convex problems has now become a challenging problem in the machine learning community. On the other hand, Nesterov's recent weighted averaging strategy succeeds in achieving the optimal individual convergence of dual averaging (DA) but it fails in the basic mirror descent (MD). In this paper, a new primal averaging (PA) gradient operation step is presented, in which the gradient evaluation is imposed on the weighted average of all past iterative solutions. We prove that simply modifying the gradient operation step in MD by PA strategy suffices to recover the optimal individual rate for general convex problems. Along this line, the optimal individual rate of convergence for strongly convex problems can also be achieved by imposing the strong convexity on the gradient operation step. Furthermore, we extend PA-MD to solve regularized nonsmooth learning problems in the stochastic setting, which reveals that PA strategy is a simple yet effective extra step toward the optimal individual convergence of SGD. Several real experiments on sparse learning and SVM problems verify the correctness of our theoretical analysis.
WOS关键词NEURAL-NETWORK ; OPTIMIZATION ; PERFORMANCE ; ALGORITHMS
资助项目NSFC[61673394]
WOS研究方向Automation & Control Systems ; Computer Science
语种英语
WOS记录号WOS:000506849800036
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构NSFC
源URL[http://ir.ia.ac.cn/handle/173211/29521]  
专题精密感知与控制研究中心_人工智能与机器学习
通讯作者Tao, Qing
作者单位1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
2.Army Engn Univ PLA, Command & Control Engn Coll, Nanjing 210007, Peoples R China
3.Army Acad Artillery & Air Def, Dept Comp Sci, Hefei 230031, Peoples R China
推荐引用方式
GB/T 7714
Tao, Wei,Pan, Zhisong,Wu, Gaowei,et al. Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence[J]. IEEE TRANSACTIONS ON CYBERNETICS,2020,50(2):835-845.
APA Tao, Wei,Pan, Zhisong,Wu, Gaowei,&Tao, Qing.(2020).Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence.IEEE TRANSACTIONS ON CYBERNETICS,50(2),835-845.
MLA Tao, Wei,et al."Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence".IEEE TRANSACTIONS ON CYBERNETICS 50.2(2020):835-845.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。