Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods
文献类型:期刊论文
作者 | Wang, ShaoPeng1; Li, JiaRui1; Cai, Yu-Dong1; Wang, Deling2; Huang, Tao3; , |
刊名 | MOLECULAR OMICS
![]() |
出版日期 | 2018 |
卷号 | 14期号:1页码:64-73 |
ISSN号 | 2515-4184 |
DOI | 10.1039/c7mo00030h |
文献子类 | Article |
英文摘要 | The cleavage site of a signal peptide located in the C-region can be recognized by the signal peptidase in eukaryotic and prokaryotic cells, and the signal peptides are typically cleaved off during or after the translocation of the target protein. The identification of cleavage sites remains challenging because of the diverse lengths of signal peptides and the weak conservation of the motif recognized by the signal peptidase. In this study, we applied a fast and accurate computational method to identify cleavage sites in signal peptides based on protein sequences. We collected 2683 protein sequences with experimentally validated N-terminus signal peptides from the newly released UniProt database. A 20 amino acid-length peptide segment flanking the cleavage site was extracted from each protein, and four types of features were used to encode the peptide segment. We applied the synthetic minority oversampling technique, maximum relevance minimum redundancy, and incremental feature selection, together with dagging and random forest algorithms, to identify the optimal features that can lead to the optimal identification of the cleavage sites. The optimal dagging and random forest classifiers constructed on the optimal features yielded Youden's indexes of 0.871 and 0.736, respectively. The sensitivity, specificity, and accuracy yielded by the optimal dagging classifier all exceeded 0.9, which demonstrated the high prediction ability of the optimal dagging classifier. These optimal features that resulted from the dagging algorithm, predominantly the position-specific scoring matrix and the amino acid factor, played crucial roles in identifying the cleavage sites by a literature review. The prediction method proposed in this study was confirmed to be a powerful tool for recognizing cleavage sites from protein sequences. |
学科主题 | Biochemistry & Molecular Biology |
WOS关键词 | PREDICTING SUBCELLULAR-LOCALIZATION ; SUPPORT VECTOR MACHINE ; ENDOPLASMIC-RETICULUM ; PROTEIN-TRANSPORT ; SEQUENCES ; TRANSLOCATION ; RELEVANCE ; MEMBRANES ; MUTANTS ; PATHWAY |
语种 | 英语 |
WOS记录号 | WOS:000450659200005 |
出版者 | ROYAL SOC CHEMISTRY |
版本 | 出版稿 |
源URL | [http://202.127.25.144/handle/331004/588] ![]() |
专题 | 中国科学院上海生命科学研究院营养科学研究所 |
作者单位 | 1.Shanghai Univ, Sch Life Sci, Shanghai 200444, Peoples R China; 2.Sun Yat Sen Univ, Collaborat Innovat Ctr Canc Med, State Key Lab Oncol South China, Dept Med Imaging,Canc Ctr, 651 Dong Feng Rd East, Guangzhou 510060, Guangdong, Peoples R China; 3.Chinese Acad Sci, Shanghai Inst Biol Sci, Inst Hlth Sci, Shanghai 200031, Peoples R China, |
推荐引用方式 GB/T 7714 | Wang, ShaoPeng,Li, JiaRui,Cai, Yu-Dong,et al. Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods[J]. MOLECULAR OMICS,2018,14(1):64-73. |
APA | Wang, ShaoPeng,Li, JiaRui,Cai, Yu-Dong,Wang, Deling,Huang, Tao,&,.(2018).Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods.MOLECULAR OMICS,14(1),64-73. |
MLA | Wang, ShaoPeng,et al."Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods".MOLECULAR OMICS 14.1(2018):64-73. |
入库方式: OAI收割
来源:上海营养与健康研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。