Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks
文献类型:期刊论文
作者 | Cao, Zhen1,2; Zhang, Shihua1,2,3![]() |
刊名 | IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
![]() |
出版日期 | 2020-03-01 |
卷号 | 17期号:2页码:657-667 |
关键词 | DNA Bioinformatics Kernel Feature extraction Support vector machines Genomics Task analysis Bioinformatics machine learning gapped k-mer deep neural network transcription factor binding site prediction |
ISSN号 | 1545-5963 |
DOI | 10.1109/TCBB.2018.2868071 |
英文摘要 | Gapped k-mers frequency vectors (gkm-fv) has been presented for extracting sequence features. Coupled with support vector machine (gkm-SVM), gkm-fvs have been used to achieve effective sequence-based predictions. However, the huge computation of a large kernel matrix prevents it from using large amount of data. It is unclear how to combine gkm-fvs with other data sources in the context of string kernel. On the other hand, the high dimensionality, colinearity, and sparsity of gkm-fvs hinder the use of many traditional machine learning methods without a kernel trick. Therefore, we proposed a flexible and scalable framework gkm-DNN to achieve feature representation from high-dimensional gkm-fvs using deep neural networks (DNN). We first proposed a more concise version of gkm-fvs, which significantly reduce the dimension of gkm-fvs. Then, we implemented an efficient method to calculate the gkm-fv of a given sequence at the first time. Finally, we adopted a DNN model with gkm-fvs as inputs to achieve efficient feature representation and a prediction task. Here, we took the transcription factor binding site prediction as an illustrative application and applied gkm-DNN onto 467 small and 69 big human ENCODE ChIP-seq datasets to demonstrate its performance and compared it with the state-of-the-art method gkm-SVM. |
资助项目 | National Natural Science Foundation of China[61621003] ; National Natural Science Foundation of China[11661141019] ; National Natural Science Foundation of China[61422309] ; National Natural Science Foundation of China[61379092] ; Strategic Priority Research Program of the Chinese Academy of Sciences (CAS)[XDB13040600] ; Ten Thousand Talent Program for Young Top-notch Talent ; Key Research Program of the Chinese Academy of Sciences[KFZD-SW-219] ; CAS Frontier Science Research Key Project for Top Young Scientist[QYZDB-SSW-SYS008] |
WOS研究方向 | Biochemistry & Molecular Biology ; Computer Science ; Mathematics |
语种 | 英语 |
WOS记录号 | WOS:000524236800025 |
出版者 | IEEE COMPUTER SOC |
源URL | [http://ir.amss.ac.cn/handle/2S8OKBNM/51124] ![]() |
专题 | 应用数学研究所 |
通讯作者 | Zhang, Shihua |
作者单位 | 1.Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China 2.Chinese Acad Sci, NCMIS, CEMS, RCSDS,Acad Math & Syst Sci, Beijing 100190, Peoples R China 3.Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China |
推荐引用方式 GB/T 7714 | Cao, Zhen,Zhang, Shihua. Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS,2020,17(2):657-667. |
APA | Cao, Zhen,&Zhang, Shihua.(2020).Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks.IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS,17(2),657-667. |
MLA | Cao, Zhen,et al."Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks".IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 17.2(2020):657-667. |
入库方式: OAI收割
来源:数学与系统科学研究院
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。