中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins

文献类型:期刊论文

作者Zheng, Cheng; Wang, Mingjun; Takemoto, Kazuhiro; Akutsu, Tatsuya; Zhang, Ziding; Song, Jiangning
刊名PLOS ONE
出版日期2012-11-14
卷号7期号:11页码:e49716
英文摘要Zinc-binding proteins are the most abundant metalloproteins in the Protein Data Bank where the zinc ions usually have catalytic, regulatory or structural roles critical for the function of the protein. Accurate prediction of zinc-binding sites is not only useful for the inference of protein function but also important for the prediction of 3D structure. Here, we present a new integrative framework that combines multiple sequence and structural properties and graph-theoretic network features, followed by an efficient feature selection to improve prediction of zinc-binding sites. We investigate what information can be retrieved from the sequence, structure and network levels that is relevant to zinc-binding site prediction. We perform a two-step feature selection using random forest to remove redundant features and quantify the relative importance of the retrieved features. Benchmarking on a high-quality structural dataset containing 1,103 protein chains and 484 zinc-binding residues, our method achieved >80% recall at a precision of 75% for the zinc-binding residues Cys, His, Glu and Asp on 5-fold cross-validation tests, which is a 10%-28% higher recall at the 75% equal precision compared to SitePredict and zincfinder at residue level using the same dataset. The independent test also indicates that our method has achieved recall of 0.790 and 0.759 at residue and protein levels, respectively, which is a performance better than the other two methods. Moreover, AUC (the Area Under the Curve) and AURPC (the Area Under the Recall-Precision Curve) by our method are also respectively better than those of the other two methods. Our method can not only be applied to large-scale identification of zinc-binding sites when structural information of the target is available, but also give valuable insights into important features arising from different levels that collectively characterize the zinc-binding sites. The scripts and datasets are available at http://protein.cau.edu.cn/zincidentifier/.
WOS标题词Science & Technology
类目[WOS]Multidisciplinary Sciences
研究领域[WOS]Science & Technology - Other Topics
关键词[WOS]SEQUENCE-BASED PREDICTION ; SECONDARY STRUCTURE ; AMINO-ACID ; DISULFIDE CONNECTIVITY ; FEATURE-SELECTION ; NEURAL-NETWORKS ; RESIDUE DEPTH ; ACCURATE ; RECOGNITION ; INFORMATION
收录类别SCI
语种英语
WOS记录号WOS:000311151900184
公开日期2013-01-16
源URL[http://124.16.173.210/handle/312001/302]  
专题天津工业生物技术研究所_结构生物信息学和整合系统生物学实验室 宋江宁_期刊论文
作者单位Chinese Acad Sci, Tianjin Inst Ind Biotechnol, Natl Engn Lab Ind Enzymes, Tianjin, Peoples R China
推荐引用方式
GB/T 7714
Zheng, Cheng,Wang, Mingjun,Takemoto, Kazuhiro,et al. An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins[J]. PLOS ONE,2012,7(11):e49716.
APA Zheng, Cheng,Wang, Mingjun,Takemoto, Kazuhiro,Akutsu, Tatsuya,Zhang, Ziding,&Song, Jiangning.(2012).An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins.PLOS ONE,7(11),e49716.
MLA Zheng, Cheng,et al."An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins".PLOS ONE 7.11(2012):e49716.

入库方式: OAI收割

来源:天津工业生物技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。