Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence
文献类型:期刊论文
作者 | Tao, Wei2; Pan, Zhisong2; Wu, Gaowei1![]() |
刊名 | IEEE TRANSACTIONS ON CYBERNETICS
![]() |
出版日期 | 2020-02-01 |
卷号 | 50期号:2页码:835-845 |
关键词 | Convergence Convex functions Machine learning Optimization methods Linear programming Cybernetics Individual convergence machine learning mirror descent (MD) methods regularized learning problems stochastic gradient descent (SGD) stochastic optimization |
ISSN号 | 2168-2267 |
DOI | 10.1109/TCYB.2018.2874332 |
通讯作者 | Tao, Qing(qing.tao@ia.ac.cn) |
英文摘要 | Many well-known first-order gradient methods have been extended to cope with large-scale composite problems, which often arise as a regularized empirical risk minimization in machine learning. However, their optimal convergence is attained only in terms of the weighted average of past iterative solutions. How to make the individual convergence of stochastic gradient descent (SGD) optimal, especially for strongly convex problems has now become a challenging problem in the machine learning community. On the other hand, Nesterov's recent weighted averaging strategy succeeds in achieving the optimal individual convergence of dual averaging (DA) but it fails in the basic mirror descent (MD). In this paper, a new primal averaging (PA) gradient operation step is presented, in which the gradient evaluation is imposed on the weighted average of all past iterative solutions. We prove that simply modifying the gradient operation step in MD by PA strategy suffices to recover the optimal individual rate for general convex problems. Along this line, the optimal individual rate of convergence for strongly convex problems can also be achieved by imposing the strong convexity on the gradient operation step. Furthermore, we extend PA-MD to solve regularized nonsmooth learning problems in the stochastic setting, which reveals that PA strategy is a simple yet effective extra step toward the optimal individual convergence of SGD. Several real experiments on sparse learning and SVM problems verify the correctness of our theoretical analysis. |
WOS关键词 | NEURAL-NETWORK ; OPTIMIZATION ; PERFORMANCE ; ALGORITHMS |
资助项目 | NSFC[61673394] |
WOS研究方向 | Automation & Control Systems ; Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000506849800036 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | NSFC |
源URL | [http://ir.ia.ac.cn/handle/173211/29521] ![]() |
专题 | 精密感知与控制研究中心_人工智能与机器学习 |
通讯作者 | Tao, Qing |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 2.Army Engn Univ PLA, Command & Control Engn Coll, Nanjing 210007, Peoples R China 3.Army Acad Artillery & Air Def, Dept Comp Sci, Hefei 230031, Peoples R China |
推荐引用方式 GB/T 7714 | Tao, Wei,Pan, Zhisong,Wu, Gaowei,et al. Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence[J]. IEEE TRANSACTIONS ON CYBERNETICS,2020,50(2):835-845. |
APA | Tao, Wei,Pan, Zhisong,Wu, Gaowei,&Tao, Qing.(2020).Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence.IEEE TRANSACTIONS ON CYBERNETICS,50(2),835-845. |
MLA | Tao, Wei,et al."Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence".IEEE TRANSACTIONS ON CYBERNETICS 50.2(2020):835-845. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。