中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Stratified Over-sampling Bagging Method for Random Forests on Imbalanced Data

文献类型:会议论文

作者He Zhao; Xiaojun Chen; Tung Nguyen; Joshua Zhexue Huang; Graham Williams; Hui Chen
出版日期2016
会议名称PAKDD 2016, Intelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedings
会议地点新西兰
英文摘要Imbalanced data presents a big challenge to random forests(RF). Over-sampling is a commonly used sampling method for imbalanced data, which increases the number of instances of minority class to balance the class distribution. However, such method often produces sample data sets that are highly correlated if we only sample more minority class instances, thus reducing the generalizability of RF. To solve this problem, we propose a strati ed over-sampling (SOB) method to generate both balanced and diverse training data sets for RF. We rst cluster the training data set multiple times to produce multiple clustering results. The small individual clusters are grouped according to their entropies. Then we sample a set of training data sets from the groups of clusters using strati ed sampling method. Finally, these training data sets are used to train RF. The data sets sampled with SOB are guaranteed to be balanced and diverse, which improves the performance of RF on imbalanced data. We have conducted a series of experiments, and the experimental results have shown that the proposed method is more effective than some existing sampling methods.
收录类别EI
语种英语
源URL[http://ir.siat.ac.cn:8080/handle/172644/10306]  
专题深圳先进技术研究院_数字所
作者单位2016
推荐引用方式
GB/T 7714
He Zhao,Xiaojun Chen,Tung Nguyen,et al. Stratified Over-sampling Bagging Method for Random Forests on Imbalanced Data[C]. 见:PAKDD 2016, Intelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedings. 新西兰.

入库方式: OAI收割

来源:深圳先进技术研究院

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。