A novel model to predict O-glycosylation sites using a highly unbalanced dataset
文献类型:期刊论文
作者 | Zhou, Kun1,2; Ai, Chunzhi1; Dong, Peipei3; Fan, Xuran1; Yang, Ling1 |
刊名 | glycoconjugate journal
![]() |
出版日期 | 2012-10-01 |
卷号 | 29期号:7页码:551-564 |
关键词 | Protein glycosylation prediction Amino acid index Feature selection PP-LDA |
产权排序 | 1,1 |
通讯作者 | 杨凌 |
英文摘要 | in silico approaches have become an alternative method to study o-glycosylation. in this paper, we developed a linear interpretable model for o-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. a training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. the sites were encoded using the amino acid index (aaindex), and the forward stepwise procedure utilized for feature selection. the linear discriminant analysis with an equal a priori probability (pp-lda) was employed to develop the interpretable model. performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. the pp-lda model exhibited improved classification results with accuracy of 82.1 % for cross-validations and 80.3 % for external prediction. further analysis of this linear model indicated that the properties at position r-1 and the properties relative to hydrophobicity contributed more to the glycosylation prediction. however, the alpha and turn propensities at the c-terminal, together with physicochemical properties at the n-terminal, are also relative to the glycosylation activity. this model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials. |
WOS标题词 | science & technology ; life sciences & biomedicine |
学科主题 | 物理化学 |
类目[WOS] | biochemistry & molecular biology |
研究领域[WOS] | biochemistry & molecular biology |
关键词[WOS] | polypeptide n-acetylgalactosaminyltransferase ; amino-acid-sequence ; mammalian proteins ; galnac-transferase ; posttranslational modifications ; neural-network ; udp-galnac ; in-vitro ; specificity ; selection |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | WOS:000308356000009 |
公开日期 | 2013-10-11 |
源URL | [http://159.226.238.44/handle/321008/118136] ![]() |
专题 | 大连化学物理研究所_中国科学院大连化学物理研究所 |
作者单位 | 1.Chinese Acad Sci, Dalian Inst Chem Phys, Lab Pharmaceut Resource Discovery, Dalian 116023, Peoples R China 2.Chinese Acad Sci, Grad Sch, Beijing 100049, Peoples R China 3.Western Med Dalian Med Univ, Res Inst Integrated Tradit, Dalian 116044, Peoples R China |
推荐引用方式 GB/T 7714 | Zhou, Kun,Ai, Chunzhi,Dong, Peipei,et al. A novel model to predict O-glycosylation sites using a highly unbalanced dataset[J]. glycoconjugate journal,2012,29(7):551-564. |
APA | Zhou, Kun,Ai, Chunzhi,Dong, Peipei,Fan, Xuran,&Yang, Ling.(2012).A novel model to predict O-glycosylation sites using a highly unbalanced dataset.glycoconjugate journal,29(7),551-564. |
MLA | Zhou, Kun,et al."A novel model to predict O-glycosylation sites using a highly unbalanced dataset".glycoconjugate journal 29.7(2012):551-564. |
入库方式: OAI收割
来源:大连化学物理研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。