中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
ESPRIT-FOREST:parallel clustering of massive amplicon sequence data in subquadratic time

文献类型:期刊论文

作者yunpeng cai; wei zheng; jin yao; yujie yang; volker mai; qi mao; yijun sun
刊名plos computational biology
出版日期2017
文献子类期刊论文
英文摘要The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analysis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then use a new multiple-pair merging criterion to construct clusters in parallel using multiple threads. The new algorithm was tested on the human microbiome project (HMP) dataset, currently one of the largest published microbial 16S rRNA sequence dataset. Our experiment demonstrated that with the power of parallel computing it is now compu- tationally feasible to perform hierarchical clustering analysis of tens of millions of sequences. The software is available at http://www.acsu.buffalo.edu/ *yijunsun/lab/ESPRIT-Forest.html. This
URL标识查看原文
语种英语
源URL[http://ir.siat.ac.cn:8080/handle/172644/12603]  
专题深圳先进技术研究院_数字所
作者单位plos computational biology
推荐引用方式
GB/T 7714
yunpeng cai,wei zheng,jin yao,et al. ESPRIT-FOREST:parallel clustering of massive amplicon sequence data in subquadratic time[J]. plos computational biology,2017.
APA yunpeng cai.,wei zheng.,jin yao.,yujie yang.,volker mai.,...&yijun sun.(2017).ESPRIT-FOREST:parallel clustering of massive amplicon sequence data in subquadratic time.plos computational biology.
MLA yunpeng cai,et al."ESPRIT-FOREST:parallel clustering of massive amplicon sequence data in subquadratic time".plos computational biology (2017).

入库方式: OAI收割

来源:深圳先进技术研究院

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。