中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model

文献类型:期刊论文

作者Chen, ZH (Chen, Zhan-Heng)[ 1,2 ]; You, ZH (You, Zhu-Hong)[ 1,2 ]; Zhang, WB (Zhang, Wen-Bo)[ 1,2 ]; Wang, YB (Wang, Yan-Bin)[ 1 ]; Cheng, L (Cheng, Li)[ 1,2 ]; Alghazzawi, D (Alghazzawi, Daniyal)[ 3 ]
刊名GENES
出版日期2019
卷号10期号:11页码:1-12
关键词self-interacting proteins de novo protein sequence global vector representation multi-grained cascade forest
ISSN号2073-4425
DOI10.3390/genes10110924
英文摘要

Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems.

WOS记录号WOS:000502296000090
源URL[http://ir.xjipc.cas.cn/handle/365002/7200]  
专题新疆理化技术研究所_多语种信息技术研究室
通讯作者You, ZH (You, Zhu-Hong)[ 1,2 ]
作者单位1.King Abdulaziz Univ, Dept Informat Syst, Jeddah 21589, Saudi Arabia
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
3.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Peoples R China
推荐引用方式
GB/T 7714
Chen, ZH ,You, ZH ,Zhang, WB ,et al. Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model[J]. GENES,2019,10(11):1-12.
APA Chen, ZH ,You, ZH ,Zhang, WB ,Wang, YB ,Cheng, L ,&Alghazzawi, D .(2019).Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model.GENES,10(11),1-12.
MLA Chen, ZH ,et al."Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model".GENES 10.11(2019):1-12.

入库方式: OAI收割

来源:新疆理化技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。