中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning

文献类型:期刊论文

作者Jiang, Youhe1; Gu, Huaxi1; Lu, Yunfeng1; Yu, Xiaoshan1,2
刊名IEEE ACCESS
出版日期2020
卷号8页码:183488-183494
关键词Distributed machine learning large-scale cluster topology communication overhead all-reduce
ISSN号2169-3536
DOI10.1109/ACCESS.2020.3028367
英文摘要Gradient synchronization, a process of communication among machines in large-scale distributed machine learning (DML), plays a crucial role in improving DML performance. Since the scale of distributed clusters is continuously expanding, state-of-the-art DML synchronization algorithms suffer from latency for thousands of GPUs. In this article, we propose 2D-HRA, a two-dimensional hierarchical ring-based all-reduce algorithm in large-scale DML. 2D-HRA combines the ring with more latency-optimal hierarchical methods, and synchronizes parameters on two dimensions to make full use of the bandwidth. Simulation results show that 2D-HRA can efficiently alleviate the high latency and accelerate the synchronization process in large-scale clusters. Compared with traditional algorithms (ring based), 2D-HRA achieves up to 76.9% reduction in gradient synchronization time in clusters of different scale.
资助项目National Key Research and Development Program of China[2018YFE0202800] ; National Natural Science Foundation of China[61634004] ; National Natural Science Foundation of China[61934002] ; Natural Science Foundation of Shaanxi Province for Distinguished Young Scholars[2020JC-26] ; Fundamental Research Funds for the Central Universities[JB190105] ; State Key Laboratory of Computer Architecture (ICT, CAS)[CARCH201919] ; China Postdoctoral Science Foundation[2018M633465]
WOS研究方向Computer Science ; Engineering ; Telecommunications
语种英语
WOS记录号WOS:000585641700001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
源URL[http://119.78.100.204/handle/2XEOYT63/15987]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Gu, Huaxi
作者单位1.Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Jiang, Youhe,Gu, Huaxi,Lu, Yunfeng,et al. 2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning[J]. IEEE ACCESS,2020,8:183488-183494.
APA Jiang, Youhe,Gu, Huaxi,Lu, Yunfeng,&Yu, Xiaoshan.(2020).2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning.IEEE ACCESS,8,183488-183494.
MLA Jiang, Youhe,et al."2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning".IEEE ACCESS 8(2020):183488-183494.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。