中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading

文献类型:期刊论文

作者Chen, Xiang2; Lu, Bing3,4; Long, Haoquan4,5; Luo, Huizhang3; Ma, Yili4; Tan, Guangming4; Tao, Dingwen4; Wu, Fei2; Lu, Tao1
刊名IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
出版日期2026-02-01
卷号37期号:2页码:518-532
关键词Hardware Computer architecture File systems Nonvolatile memory Bandwidth Engines Prototypes Data compression Software Flash memories high performance computing solid state drives
ISSN号1045-9219
DOI10.1109/TPDS.2025.3643175
英文摘要Burst buffers (BBs) act as an intermediate storage layer between compute nodes and parallel file systems (PFS), effectively alleviating the I/O performance gap in high-performance computing (HPC). As scientific simulations and AI workloads generate larger checkpoints and analysis outputs, BB capacity shortages and PFS bandwidth bottlenecks are emerging, and CPU-based compression is not an effective solution due to its high overhead. We introduce Computational Burst Buffers (CBBs), a storage paradigm that embeds hardware compression engines such as application-specific integrated circuit (ASIC) inside computational storage drives (CSDs) at the BB tier. CBB transparently offloads both lossless and error-bounded lossy compression from CPUs to CSDs, thereby (i) expanding effective SSD-backed BB capacity, (ii) reducing BB-PFS traffic, and (iii) eliminating contention and energy overheads of CPU-based compression. Unlike prior CSD-based compression designs targeting databases or flash caching, CBB co-designs the burst-buffer layer and CSD hardware for HPC and quantitatively evaluates compression offload in BB-PFS hierarchies. We prototype CBB using a PCIe 5.0 CSD with an ASIC Zstd-like compressor and an FPGA prototype of an SZ entropy encoder, and evaluate CBB on a 16-node cluster. Experiments with four representative HPC applications and a large-scale workflow simulator show up to 61% lower application runtime, 8-12x higher cache hit ratios, and substantially reduced compute-node CPU utilization compared to software compression and conventional BBs. These results demonstrate that compression-aware BBs with CSDs provide a practical, scalable path to next-generation HPC storage.
资助项目National Key R&D Program of China[2023YFB4502901] ; Shenzhen Key RD Program[KJZD20240903102459001] ; National Natural Science Foundation of China[62372197] ; National Natural Science Foundation of China[U2001203] ; National Natural Science Foundation of China[U22A2071] ; National Natural Science Foundation of China[62102155] ; National Natural Science Foundation of China[62032023] ; National Natural Science Foundation of China[T2125013] ; Innovation Funding of ICT, CAS[E461050]
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:001655675200001
出版者IEEE COMPUTER SOC
源URL[http://119.78.100.204/handle/2XEOYT63/42917]  
专题中国科学院计算技术研究所
通讯作者Luo, Huizhang; Tao, Dingwen; Lu, Tao
作者单位1.DapuStor Corp, Shenzhen 518100, Peoples R China
2.Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China
3.Hunan Univ, Changsha 410008, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
5.Univ Sci & Technol China, Hefei 230026, Peoples R China
推荐引用方式
GB/T 7714
Chen, Xiang,Lu, Bing,Long, Haoquan,et al. Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2026,37(2):518-532.
APA Chen, Xiang.,Lu, Bing.,Long, Haoquan.,Luo, Huizhang.,Ma, Yili.,...&Lu, Tao.(2026).Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,37(2),518-532.
MLA Chen, Xiang,et al."Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 37.2(2026):518-532.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。