DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
文献类型:期刊论文
作者 | Ye CX[*]1,2; Hill CM1; Wu SG3; Ruan J3; Ma ZS[*]2 |
刊名 | SCIENTIFIC REPORTS
![]() |
出版日期 | 2016 |
卷号 | 6期号:X页码:e31900 |
通讯作者 | cxy@umd.edu ; samma@uidaho.edu |
合作状况 | 其它 |
英文摘要 | The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost. |
收录类别 | SCI |
资助信息 | This research received funding from the following sources: National Science Foundation of China (Grants No. 61175071, 71473243), the Exceptional Scientists Program and Top Oversea Scholars Program of Yunnan Province, and Yunling Industrial Innovation Grant. |
语种 | 英语 |
源URL | [http://159.226.149.26:8080/handle/152453/10424] ![]() |
专题 | 昆明动物研究所_计算生物与生物信息学 昆明动物研究所_遗传资源与进化国家重点实验室 |
作者单位 | 1.Department of Computer Science, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA 2.Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China 3.Agricultural Genome Institute, Chinese Academy of Agricultural Sciences, No.7 Pengfei Road, Dapeng New District, Shenzhen, Guangdong 518120, China |
推荐引用方式 GB/T 7714 | Ye CX[*],Hill CM,Wu SG,et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies[J]. SCIENTIFIC REPORTS,2016,6(X):e31900. |
APA | Ye CX[*],Hill CM,Wu SG,Ruan J,&Ma ZS[*].(2016).DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.SCIENTIFIC REPORTS,6(X),e31900. |
MLA | Ye CX[*],et al."DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies".SCIENTIFIC REPORTS 6.X(2016):e31900. |
入库方式: OAI收割
来源:昆明动物研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。