Stratified sampling for feature subspace selection in random forests for high dimensional data
文献类型:期刊论文
作者 | Yunming Ye; Qingyao Wu; Joshua Zhexue Huangb,; Michael K. Ng; Xutao Li |
刊名 | PATTERN RECOGNITION
![]() |
出版日期 | 2013 |
英文摘要 | For high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups. One group will contain strong informative features and the other weak informative features. Then, for feature subspace selection, we randomly select features from each group proportionally. The advantage of stratified sampling is that we can ensure that each subspace contains enough informative features for classification in high dimensional data. Testing on both synthetic data and various real data sets in gene classification, image categorization and face recognition data sets consistently demonstrates the effectiveness of this new method. The performance is shown to better that of state-of-the-art algorithms including SVM, the four variants of random forests (RF, ERT, enrich-RF, and oblique-RF), and nearest neighbor (NN) algorithms. |
收录类别 | SCI |
原文出处 | http://www.sciencedirect.com/science/article/pii/S0031320312003974 |
语种 | 英语 |
源URL | [http://ir.siat.ac.cn:8080/handle/172644/4836] ![]() |
专题 | 深圳先进技术研究院_医工所 |
作者单位 | PATTERN RECOGNITION |
推荐引用方式 GB/T 7714 | Yunming Ye,Qingyao Wu,Joshua Zhexue Huangb,,et al. Stratified sampling for feature subspace selection in random forests for high dimensional data[J]. PATTERN RECOGNITION,2013. |
APA | Yunming Ye,Qingyao Wu,Joshua Zhexue Huangb,,Michael K. Ng,&Xutao Li.(2013).Stratified sampling for feature subspace selection in random forests for high dimensional data.PATTERN RECOGNITION. |
MLA | Yunming Ye,et al."Stratified sampling for feature subspace selection in random forests for high dimensional data".PATTERN RECOGNITION (2013). |
入库方式: OAI收割
来源:深圳先进技术研究院
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。