DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training
文献类型:期刊论文
| 作者 | Meng, Lin1,2,3; Sun, Yuzhong1,3; Zhu, Jie3,4 |
| 刊名 | FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
![]() |
| 出版日期 | 2026-02-01 |
| 卷号 | 175页码:15 |
| 关键词 | Distributed deep learning Communication scheduling Data parallelism |
| ISSN号 | 0167-739X |
| DOI | 10.1016/j.future.2025.108103 |
| 英文摘要 | Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data dependencies break some overlapping between communication and computation; (2) high coverage rates impair further improvement on performance; (3) imbalanced communication/computation times of tensors caused by partitioning/fusion strategies cause more bubbles. Therefore, we propose a new communication scheduling scheme DeFT, whose key insight is to relax data dependencies and support flexible scheduling in distributed training without reordering bucket communications. DeFT uncovers new overlapping chances in training by transforming the scheduling problem into multiple knapsack problems. Specifically, DeFT eliminates hard dependencies with delayed updates, reducing the coverage rate by adjusting update frequency and utilizing heterogeneous communication links, merging the computation times of backward or forward as the knapsack capacity to avoid the negative impact of unbalanced tensors. Additionally, DeFT preserves training accuracy by adjusting its scheduling strategy via convergence loss quantification. Extensive experiments with 16 A100 GPUs showed that DeFT achieved speedups of 29% to 115% on three representative benchmarks compared to US-Byte and Bytescheduler with no loss of accuracy. |
| 资助项目 | Science and Technology Innovation 2030-Major Project[2022ZD0119104] |
| WOS研究方向 | Computer Science |
| 语种 | 英语 |
| WOS记录号 | WOS:001565585500003 |
| 出版者 | ELSEVIER |
| 源URL | [http://119.78.100.204/handle/2XEOYT63/41722] ![]() |
| 专题 | 中国科学院计算技术研究所期刊论文_英文 |
| 通讯作者 | Sun, Yuzhong |
| 作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Beijing 101408, Peoples R China 3.Chinese Acad Sci, Inst Comp Technol, State Key Lab Chinese Comp Architecture, Beijing 100864, Peoples R China 4.Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing 210023, Peoples R China |
| 推荐引用方式 GB/T 7714 | Meng, Lin,Sun, Yuzhong,Zhu, Jie. DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,2026,175:15. |
| APA | Meng, Lin,Sun, Yuzhong,&Zhu, Jie.(2026).DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training.FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE,175,15. |
| MLA | Meng, Lin,et al."DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training".FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 175(2026):15. |
入库方式: OAI收割
来源:计算技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。

