中国科学院机构知识库网格系统: Scalable random forests for massive data

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Scalable random forests for massive data

文献类型：会议论文


作者	Li bingguo; Chen xiaojun; Li Mark junjie; Huang Joshua zhexue; Feng shengzhong
出版日期	2012
会议名称	16th Pacific-Asia Conference on Advances in Knowledge Discovery andData Mining, PAKDD 2012
会议地点	Kuala Lumpur, Malaysia
英文摘要	This paper proposes a scalable random forest algorithm SRF with MapReduce implementation. A breadth-first approach is used to grow decision trees for arandom forest model. At each level of the trees, a pair of map and reduce functions split the nodes. A mapper is dispatched to a local machine to compute the local histograms of subspace features of the nodes from a data block. The local histograms are submitted to reducers to compute the global histograms from which the best split conditions of the nodes are calculated and sent to the controller on the master machine to update the random forest model. A random forest model is built with a sequence of map and reduce functions. Experiments on large synthetic data have shown that SRF is scalable to the number of trees and the number of examples. The SRF algorithm is able to build a random forest of 100 trees in a little more than 1 hour from 110 Gigabyte data with 1000 features and 10 million records. © 2012 Springer-Verlag.(18 refs)
收录类别	EI
语种	英语
源URL	[http://ir.siat.ac.cn:8080/handle/172644/4219]
专题	深圳先进技术研究院_数字所
作者单位	2012
推荐引用方式 GB/T 7714	Li bingguo,Chen xiaojun,Li Mark junjie,et al. Scalable random forests for massive data[C]. 见:16th Pacific-Asia Conference on Advances in Knowledge Discovery andData Mining, PAKDD 2012. Kuala Lumpur, Malaysia.

入库方式： OAI收割

来源：深圳先进技术研究院

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。