中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A novel model to predict O-glycosylation sites using a highly unbalanced dataset

文献类型:期刊论文

作者Zhou, Kun1,2; Ai, Chunzhi1; Dong, Peipei3; Fan, Xuran1; Yang, Ling1
刊名glycoconjugate journal
出版日期2012-10-01
卷号29期号:7页码:551-564
关键词Protein glycosylation prediction Amino acid index Feature selection PP-LDA
产权排序1,1
通讯作者杨凌
英文摘要in silico approaches have become an alternative method to study o-glycosylation. in this paper, we developed a linear interpretable model for o-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. a training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. the sites were encoded using the amino acid index (aaindex), and the forward stepwise procedure utilized for feature selection. the linear discriminant analysis with an equal a priori probability (pp-lda) was employed to develop the interpretable model. performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. the pp-lda model exhibited improved classification results with accuracy of 82.1 % for cross-validations and 80.3 % for external prediction. further analysis of this linear model indicated that the properties at position r-1 and the properties relative to hydrophobicity contributed more to the glycosylation prediction. however, the alpha and turn propensities at the c-terminal, together with physicochemical properties at the n-terminal, are also relative to the glycosylation activity. this model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials.
WOS标题词science & technology ; life sciences & biomedicine
学科主题物理化学
类目[WOS]biochemistry & molecular biology
研究领域[WOS]biochemistry & molecular biology
关键词[WOS]polypeptide n-acetylgalactosaminyltransferase ; amino-acid-sequence ; mammalian proteins ; galnac-transferase ; posttranslational modifications ; neural-network ; udp-galnac ; in-vitro ; specificity ; selection
收录类别SCI
语种英语
WOS记录号WOS:000308356000009
公开日期2013-10-11
源URL[http://159.226.238.44/handle/321008/118136]  
专题大连化学物理研究所_中国科学院大连化学物理研究所
作者单位1.Chinese Acad Sci, Dalian Inst Chem Phys, Lab Pharmaceut Resource Discovery, Dalian 116023, Peoples R China
2.Chinese Acad Sci, Grad Sch, Beijing 100049, Peoples R China
3.Western Med Dalian Med Univ, Res Inst Integrated Tradit, Dalian 116044, Peoples R China
推荐引用方式
GB/T 7714
Zhou, Kun,Ai, Chunzhi,Dong, Peipei,et al. A novel model to predict O-glycosylation sites using a highly unbalanced dataset[J]. glycoconjugate journal,2012,29(7):551-564.
APA Zhou, Kun,Ai, Chunzhi,Dong, Peipei,Fan, Xuran,&Yang, Ling.(2012).A novel model to predict O-glycosylation sites using a highly unbalanced dataset.glycoconjugate journal,29(7),551-564.
MLA Zhou, Kun,et al."A novel model to predict O-glycosylation sites using a highly unbalanced dataset".glycoconjugate journal 29.7(2012):551-564.

入库方式: OAI收割

来源:大连化学物理研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。