中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

文献类型:期刊论文

作者Ye CX[*]1,2; Hill CM1; Wu SG3; Ruan J3; Ma ZS[*]2
刊名SCIENTIFIC REPORTS
出版日期2016
卷号6期号:X页码:e31900
通讯作者cxy@umd.edu ; samma@uidaho.edu
合作状况其它
英文摘要The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost.
收录类别SCI
资助信息This research received funding from the following sources: National Science Foundation of China (Grants No. 61175071, 71473243), the Exceptional Scientists Program and Top Oversea Scholars Program of Yunnan Province, and Yunling Industrial Innovation Grant.
语种英语
源URL[http://159.226.149.26:8080/handle/152453/10424]  
专题昆明动物研究所_计算生物与生物信息学
昆明动物研究所_遗传资源与进化国家重点实验室
作者单位1.Department of Computer Science, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
2.Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China
3.Agricultural Genome Institute, Chinese Academy of Agricultural Sciences, No.7 Pengfei Road, Dapeng New District, Shenzhen, Guangdong 518120, China
推荐引用方式
GB/T 7714
Ye CX[*],Hill CM,Wu SG,et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies[J]. SCIENTIFIC REPORTS,2016,6(X):e31900.
APA Ye CX[*],Hill CM,Wu SG,Ruan J,&Ma ZS[*].(2016).DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.SCIENTIFIC REPORTS,6(X),e31900.
MLA Ye CX[*],et al."DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies".SCIENTIFIC REPORTS 6.X(2016):e31900.

入库方式: OAI收割

来源:昆明动物研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。