中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics

文献类型:期刊论文

作者Ren, Jie ; Song, Kai ; Deng, Minghua ; Reinert, Gesine ; Cannon, Charles H. ; Sun, Fengzhu
刊名BIOINFORMATICS
出版日期2016
卷号32期号:7页码:993-1000
关键词DNA-SEQUENCES STATISTICAL-INFERENCE CHAIN ANALYSIS ALIGNMENT METAGENOMICS FREQUENCIES PREDICTION ENHANCERS BROWSER WORDS
中文摘要Motivation: Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential. 

A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. 

Results: Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution, using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results, and that the clustering results that use an MC of the estimated order give a plausible clustering of the species.
公开日期2016-06-06
源URL[http://ir.xtbg.org.cn/handle/353005/9883]  
专题西双版纳热带植物园_其他
推荐引用方式
GB/T 7714
Ren, Jie,Song, Kai,Deng, Minghua,et al. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics[J]. BIOINFORMATICS,2016,32(7):993-1000.
APA Ren, Jie,Song, Kai,Deng, Minghua,Reinert, Gesine,Cannon, Charles H.,&Sun, Fengzhu.(2016).Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.BIOINFORMATICS,32(7),993-1000.
MLA Ren, Jie,et al."Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics".BIOINFORMATICS 32.7(2016):993-1000.

入库方式: OAI收割

来源:西双版纳热带植物园

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。