Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing

The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes...

Full description

Saved in:
Bibliographic Details
Published inSociety for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining p. 1023
Main Authors Rasheed, Zeehasham, Rangwala, Huzefa, Barbará, Daniel
Format Conference Proceeding
LanguageEnglish
Published Philadelphia Society for Industrial and Applied Mathematics 01.01.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we developed an efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences. We introduce the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function. We evaluate the performance of our algorithm on two metagenome datasets associated with microbes existing across different human skin locations. Our empirical results show the strength of the developed approach in comparison to three state-of-the-art sequence clustering algorithms with regards to computational efficiency and clustering quality. We also demonstrate practical significance for the developed clustering algorithm, to compare bacterial diversity and structure across different skin locations. [PUBLICATION ABSTRACT]