中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
SF-Sketch: A Two-Stage Sketch for Data Streams

文献类型:期刊论文

作者Liu, Lingtong1,2; Shen, Yulong1,2; Yan, Yibo3; Yang, Tong3; Shahzad, Muhammad4; Cui, Bin3; Xie, Gaogang5
刊名IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
出版日期2020-10-01
卷号31期号:10页码:2263-2276
关键词Distributed databases Monitoring Bars Frequency measurement Registers Fats Hash functions Network measurements sketch distributed monitoring multiset frequent items
ISSN号1045-9219
DOI10.1109/TPDS.2020.2987609
英文摘要Sketches are probabilistic data structures designed for recording frequencies of items in a multi-set. They are widely used in various fields, especially for gathering Internet statistics from distributed data streams in network measurements. In a distributed streaming application with high data rates, a sketch in each monitoring node "fills up" very quickly and then its content is transferred to a remote collector responsible for answering queries. Thus, the size of the contents transferred must be kept as small as possible while meeting the desired accuracy requirement. To obtain significantly higher accuracy while keeping the same update and query speed as the best prior sketches, in this article, we propose a new sketch - the Slim-Fat (SF) sketch. The key idea behind the SF-sketch is to maintain two separate sketches: a larger sketch, the Fat-subsketch, and a smaller sketch, the Slim-subsketch. The Fat-subsketch is used for updating and periodically producing the Slim-subsketch, which is then transferred to the remote collector for answering queries quickly and accurately. We also present the error bound as well as an accurate model of the correct rate of the SF-sketch, and verify their correctness through experiments. We implemented and extensively evaluated the SF-sketch along with several prior sketches. Our results show that when the size of our Slim-subsketch and of the widely used Count-Min (CM) sketch are kept the same, our SF-sketch outperforms the CM-sketch by up to 33.1 times in terms of accuracy (when the ratio of the sizes of the Fat-subsketch and the Slim-subsketch is 16:1). We have made all source codes publicly available at Github ["Source code of SF sketches," [Online]. Available: https://github.com/paper2017/SF-sketch].
资助项目National Key Research and Development Program of China[2018YFE0207600] ; National Key Research and Development Program of China[2018YFB2100403] ; NSFC[61672061] ; NSFC[U1736216] ; National Science Foundation of USA[CNS-1616317] ; National Science Foundation of USA[CNS-1616273]
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:000536291500003
出版者IEEE COMPUTER SOC
源URL[http://119.78.100.204/handle/2XEOYT63/15296]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Shen, Yulong; Yang, Tong
作者单位1.Xidian Univ, Shaanxi Key Lab Network & Syst Secur, Xian 710071, Peoples R China
2.Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
3.Peking Univ, Dept Comp & Sci, Beijing 100871, Peoples R China
4.North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
5.Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
推荐引用方式
GB/T 7714
Liu, Lingtong,Shen, Yulong,Yan, Yibo,et al. SF-Sketch: A Two-Stage Sketch for Data Streams[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(10):2263-2276.
APA Liu, Lingtong.,Shen, Yulong.,Yan, Yibo.,Yang, Tong.,Shahzad, Muhammad.,...&Xie, Gaogang.(2020).SF-Sketch: A Two-Stage Sketch for Data Streams.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(10),2263-2276.
MLA Liu, Lingtong,et al."SF-Sketch: A Two-Stage Sketch for Data Streams".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.10(2020):2263-2276.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。