中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Parallel Incremental Frequent Itemset Mining for Large Data

文献类型:期刊论文

作者Song, Yu-Geng1,2; Cui, Hui-Min1,2; Feng, Xiao-Bing1
刊名JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
出版日期2017-03-01
卷号32期号:2页码:368-385
关键词incremental parallel FPGrowth data mining frequent itemset mining MapReduce
ISSN号1000-9000
DOI10.1007/s11390-017-1726-y
英文摘要Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it.
资助项目National High Technology Research and Development 863 Program of China[2015AA011505] ; National High Technology Research and Development 863 Program of China[2015AA015306] ; National High Technology Research and Development 863 Program of China[2012AA010902] ; National Natural Science Foundation of China[61202055] ; National Natural Science Foundation of China[61221062] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61303053] ; National Natural Science Foundation of China[61432016] ; National Natural Science Foundation of China[61402445] ; National Natural Science Foundation of China[61672492] ; National Key Research and Development Program of China[2016YFB1000402]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:000397835500014
出版者SCIENCE PRESS
源URL[http://119.78.100.204/handle/2XEOYT63/7411]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Song, Yu-Geng
作者单位1.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Song, Yu-Geng,Cui, Hui-Min,Feng, Xiao-Bing. Parallel Incremental Frequent Itemset Mining for Large Data[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2017,32(2):368-385.
APA Song, Yu-Geng,Cui, Hui-Min,&Feng, Xiao-Bing.(2017).Parallel Incremental Frequent Itemset Mining for Large Data.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,32(2),368-385.
MLA Song, Yu-Geng,et al."Parallel Incremental Frequent Itemset Mining for Large Data".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 32.2(2017):368-385.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。