Genome-wide association data classification and SNPs selection using two-stage quality-basedRandom Forests
文献类型:期刊论文
作者 | Thanh-Tung Nguyen; Huang, Joshua Zhexue; Wu, Qingyao; Thuy Thi Nguyen; Li, Mark Junjie |
刊名 | BMC GENOMICS
![]() |
出版日期 | 2014 |
英文摘要 | Background: Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspaceselection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. |
收录类别 | SCI |
原文出处 | http://www.biomedcentral.com/qc/1471-2164/16/S2/S5 |
语种 | 英语 |
源URL | [http://ir.siat.ac.cn:8080/handle/172644/5964] ![]() |
专题 | 深圳先进技术研究院_数字所 |
作者单位 | BMC GENOMICS |
推荐引用方式 GB/T 7714 | Thanh-Tung Nguyen,Huang, Joshua Zhexue,Wu, Qingyao,et al. Genome-wide association data classification and SNPs selection using two-stage quality-basedRandom Forests[J]. BMC GENOMICS,2014. |
APA | Thanh-Tung Nguyen,Huang, Joshua Zhexue,Wu, Qingyao,Thuy Thi Nguyen,&Li, Mark Junjie.(2014).Genome-wide association data classification and SNPs selection using two-stage quality-basedRandom Forests.BMC GENOMICS. |
MLA | Thanh-Tung Nguyen,et al."Genome-wide association data classification and SNPs selection using two-stage quality-basedRandom Forests".BMC GENOMICS (2014). |
入库方式: OAI收割
来源:深圳先进技术研究院
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。