中国科学院机构知识库网格系统: Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading

Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading

文献类型：期刊论文


作者	Chen, Xiang 2; Lu, Bing 3,4; Long, Haoquan 4,5; Luo, Huizhang 3; Ma, Yili 4; Tan, Guangming 4; Tao, Dingwen 4; Wu, Fei 2; Lu, Tao 1
刊名	IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
出版日期	2026-02-01
卷号	37 期号:2 页码:518-532
关键词	Hardware Computer architecture File systems Nonvolatile memory Bandwidth Engines Prototypes Data compression Software Flash memories high performance computing solid state drives
ISSN号	1045-9219
DOI	10.1109/TPDS.2025.3643175
英文摘要	Burst buffers (BBs) act as an intermediate storage layer between compute nodes and parallel file systems (PFS), effectively alleviating the I/O performance gap in high-performance computing (HPC). As scientific simulations and AI workloads generate larger checkpoints and analysis outputs, BB capacity shortages and PFS bandwidth bottlenecks are emerging, and CPU-based compression is not an effective solution due to its high overhead. We introduce Computational Burst Buffers (CBBs), a storage paradigm that embeds hardware compression engines such as application-specific integrated circuit (ASIC) inside computational storage drives (CSDs) at the BB tier. CBB transparently offloads both lossless and error-bounded lossy compression from CPUs to CSDs, thereby (i) expanding effective SSD-backed BB capacity, (ii) reducing BB-PFS traffic, and (iii) eliminating contention and energy overheads of CPU-based compression. Unlike prior CSD-based compression designs targeting databases or flash caching, CBB co-designs the burst-buffer layer and CSD hardware for HPC and quantitatively evaluates compression offload in BB-PFS hierarchies. We prototype CBB using a PCIe 5.0 CSD with an ASIC Zstd-like compressor and an FPGA prototype of an SZ entropy encoder, and evaluate CBB on a 16-node cluster. Experiments with four representative HPC applications and a large-scale workflow simulator show up to 61% lower application runtime, 8-12x higher cache hit ratios, and substantially reduced compute-node CPU utilization compared to software compression and conventional BBs. These results demonstrate that compression-aware BBs with CSDs provide a practical, scalable path to next-generation HPC storage.
资助项目	National Key R&D Program of China[2023YFB4502901] ; Shenzhen Key RD Program[KJZD20240903102459001] ; National Natural Science Foundation of China[62372197] ; National Natural Science Foundation of China[U2001203] ; National Natural Science Foundation of China[U22A2071] ; National Natural Science Foundation of China[62102155] ; National Natural Science Foundation of China[62032023] ; National Natural Science Foundation of China[T2125013] ; Innovation Funding of ICT, CAS[E461050]
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001655675200001
出版者	IEEE COMPUTER SOC
源URL	[http://119.78.100.204/handle/2XEOYT63/42917]
专题	中国科学院计算技术研究所
通讯作者	Luo, Huizhang; Tao, Dingwen; Lu, Tao
作者单位	1.DapuStor Corp, Shenzhen 518100, Peoples R China 2.Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China 3.Hunan Univ, Changsha 410008, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 5.Univ Sci & Technol China, Hefei 230026, Peoples R China
推荐引用方式 GB/T 7714	Chen, Xiang,Lu, Bing,Long, Haoquan,et al. Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2026,37(2):518-532.
APA	Chen, Xiang.,Lu, Bing.,Long, Haoquan.,Luo, Huizhang.,Ma, Yili.,...&Lu, Tao.(2026).Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,37(2),518-532.
MLA	Chen, Xiang,et al."Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 37.2(2026):518-532.

入库方式： OAI收割

来源：计算技术研究所

下载0

Computational Burst Buffers: Accelerating HPC I/O via In-Storage Compression Offloading

其他版本