Improving Metagenome Sequence Clustering Application Performance Using Louvain Algorithm
Metagenomic assembly is a very challenging subject due to the huge data volume of next-generation sequencing (NGS). The ability of clustering strategy to handle large amounts of data makes it an ideal solution to memory limitations. SpaRC (Spark Reads Clustering), a scalable sequences clustering too...
Saved in:
Published in | Recent Featured Applications of Artificial Intelligence Methods. LSMS 2020 and ICSEE 2020 Workshops Vol. 1303; pp. 386 - 400 |
---|---|
Main Authors | , , , , |
Format | Book Chapter |
Language | English |
Published |
Singapore
Springer
2021
Springer Singapore |
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
ISBN | 9813363770 9789813363779 |
ISSN | 1865-0929 1865-0937 |
DOI | 10.1007/978-981-33-6378-6_29 |
Cover
Loading…
Summary: | Metagenomic assembly is a very challenging subject due to the huge data volume of next-generation sequencing (NGS). The ability of clustering strategy to handle large amounts of data makes it an ideal solution to memory limitations. SpaRC (Spark Reads Clustering), a scalable sequences clustering tool based on the Apache Spark, a distributed big data analysis platform, provides a solution to cluster hundreds of GBs of sequences from different genomes. However, the Label Propagation Algorithm (LPA) used in SpaRC is usually unstable, causing the clustering results to oscillate and contain too many tiny clusters. In this paper, we proposed a method for clustering metagenomic sequences based on the distributed Louvain algorithm to obtain more accurate clustering results. We performed experiments on two different datasets with millions of genome sequences based on LPA and Louvain, respectively. The experimental results indicate that this approach can effectively improve clustering performance. We hope that the method applied in this paper can be widely used in other metagenomic clustering studies. |
---|---|
ISBN: | 9813363770 9789813363779 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-981-33-6378-6_29 |