中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
syncchecker: detecting synchronization errors between mpi applications and libraries

文献类型:会议论文

作者Chen Zhezhe ; Li Xinyu ; Chen Jau-Yuan ; Zhong Hua ; Qin Feng
出版日期2012
会议名称2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
会议日期May 21, 2012 - May 25, 2012
会议地点Shanghai, China
关键词Communication Computer operating systems Distributed parameter networks Experiments Libraries Software testing
页码342-353
中文摘要While improving the performance, nonblocking communication is prone to synchronization errors between MPI applications and the underlying MPI libraries. Such synchronization error occurs in the following way. After initiating nonblocking communication and performing overlapped computation, the MPI application reuses the message buffer before the MPI library completes the use of the same buffer, which may lead to sending out corrupted message data or reading undefined message data. This paper presents a new method called Sync Checker to detect synchronization errors in MPI nonblocking communication. To examine whether the use of message buffers is well synchronized between the MPI application and the MPI library, Sync Checker first tracks relevant memory accesses in the MPI application and corresponding message send/receive operations in the MPI library. Then it checks whether the correct execution order between the MPI application and the MPI library is enforced by the MPI completion check routines. If not, Sync Checker reports the error with diagnostic information. To reduce runtime overhead, we propose three dynamic optimizations. We have implemented a prototype of Sync Checker on Linux and evaluated it with seven bug cases, i.e., five introduced by the original developers and two injected, in four different MPI applications. Our experiments show that Sync Checker detects all the evaluated synchronization errors and provides helpful diagnostic information. Moreover, our experiments with seven NAS Parallel Benchmarks demonstrate that Sync Checker incurs moderate runtime overhead, 1.3-9.5 times with an average of 5.2 times, making it suitable for software testing. © 2012 IEEE.
英文摘要While improving the performance, nonblocking communication is prone to synchronization errors between MPI applications and the underlying MPI libraries. Such synchronization error occurs in the following way. After initiating nonblocking communication and performing overlapped computation, the MPI application reuses the message buffer before the MPI library completes the use of the same buffer, which may lead to sending out corrupted message data or reading undefined message data. This paper presents a new method called Sync Checker to detect synchronization errors in MPI nonblocking communication. To examine whether the use of message buffers is well synchronized between the MPI application and the MPI library, Sync Checker first tracks relevant memory accesses in the MPI application and corresponding message send/receive operations in the MPI library. Then it checks whether the correct execution order between the MPI application and the MPI library is enforced by the MPI completion check routines. If not, Sync Checker reports the error with diagnostic information. To reduce runtime overhead, we propose three dynamic optimizations. We have implemented a prototype of Sync Checker on Linux and evaluated it with seven bug cases, i.e., five introduced by the original developers and two injected, in four different MPI applications. Our experiments show that Sync Checker detects all the evaluated synchronization errors and provides helpful diagnostic information. Moreover, our experiments with seven NAS Parallel Benchmarks demonstrate that Sync Checker incurs moderate runtime overhead, 1.3-9.5 times with an average of 5.2 times, making it suitable for software testing. © 2012 IEEE.
收录类别EI
会议录Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
语种英语
ISBN号9780769546759
源URL[http://ir.iscas.ac.cn/handle/311060/15748]  
专题软件研究所_软件所图书馆_会议论文
推荐引用方式
GB/T 7714
Chen Zhezhe,Li Xinyu,Chen Jau-Yuan,et al. syncchecker: detecting synchronization errors between mpi applications and libraries[C]. 见:2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. Shanghai, China. May 21, 2012 - May 25, 2012.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。