Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing
The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes...
Saved in:
Published in | Society for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining p. 1023 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
Philadelphia
Society for Industrial and Applied Mathematics
01.01.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we developed an efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences. We introduce the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function. We evaluate the performance of our algorithm on two metagenome datasets associated with microbes existing across different human skin locations. Our empirical results show the strength of the developed approach in comparison to three state-of-the-art sequence clustering algorithms with regards to computational efficiency and clustering quality. We also demonstrate practical significance for the developed clustering algorithm, to compare bacterial diversity and structure across different skin locations. [PUBLICATION ABSTRACT] |
---|---|
AbstractList | The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we developed an efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences. We introduce the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function. We evaluate the performance of our algorithm on two metagenome datasets associated with microbes existing across different human skin locations. Our empirical results show the strength of the developed approach in comparison to three state-of-the-art sequence clustering algorithms with regards to computational efficiency and clustering quality. We also demonstrate practical significance for the developed clustering algorithm, to compare bacterial diversity and structure across different skin locations. [PUBLICATION ABSTRACT] |
Author | Barbará, Daniel Rasheed, Zeehasham Rangwala, Huzefa |
Author_xml | – sequence: 1 givenname: Zeehasham surname: Rasheed fullname: Rasheed, Zeehasham – sequence: 2 givenname: Huzefa surname: Rangwala fullname: Rangwala, Huzefa – sequence: 3 givenname: Daniel surname: Barbará fullname: Barbará, Daniel |
BookMark | eNrjYmDJy89L5WTwc01Ly0zOTM0rUXDOKS0uSS3KzEtXyE9T8E0tSUxPzcvPzUxWCE4tLE3NS04tVigtBkn75Ccn5mSWVAIl8oozSzLLUhU8EoszgFI8DKxpiTnFqbxQmptB2c01xNlDt6AoH2hGcUl8Vn5pUR5QKt7QwMjU3MTc1NDUmDhVAHRROyE |
ContentType | Conference Proceeding |
Copyright | Copyright Society for Industrial and Applied Mathematics 2012 |
Copyright_xml | – notice: Copyright Society for Industrial and Applied Mathematics 2012 |
DBID | 3V. 7WY 7WZ 7X2 7XB 87Z 88A 88F 88I 88K 8AL 8FE 8FG 8FH 8FK 8FL 8G5 AAFGM AAMXL ABJCF ABLUL ABOIG ABPUF ABQRF ABRGS ABSSA ABUWG ACIOU ADSMB ADZZV AEEYA AEQXP AFKRA AFLLJ AFOKG AFOLM AGAJT AGSBL AJNOY ANJRB AQTIP ARAPS ATCPS AZQEC BBNVY BENPR BEZIV BGLVJ BHPHI BOUDT CBHQV CCPQU D1I DWQXO FRNLG F~G GNUQQ GUQSH HCIFZ JQ2 K60 K6~ K7- KB. L.- L6V LK8 M0C M0K M0N M1Q M2O M2P M2T M7P M7S MBDVC P5Z P62 PATMY PDBOC PQBIZ PQBZA PQCXX PQEST PQQKQ PQUKI PRLXX PTHSS PYCSY Q9U |
DatabaseName | ProQuest Central (Corporate) ABI/INFORM Collection ABI/INFORM Global (PDF only) Agricultural Science Collection ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Biology Database (Alumni Edition) Military Database (Alumni Edition) Science Database (Alumni Edition) Telecommunications (Alumni Edition) Computing Database (Alumni Edition) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni Edition) Research Library (Alumni Edition) ProQuest Central Korea - hybrid linking Natural Science Collection - hybrid linking Materials Science & Engineering Collection Business Premium Collection - hybrid linking Biological Science Collection - hybrid linking ABI/INFORM Collection (Alumni) - hybrid linking Technology Collection - hybrid linking Materials Science & Engineering Collection - hybrid linking ABI/INFORM Collection - hybrid linking ProQuest Central (Alumni) ABI/INFORM Global - hybrid linking Military Database - hybrid linking ProQuest Central (Alumni) - hybrid linking Environmental Science Collection - hybrid linking Military Database (Alumni) - hybrid linking ProQuest Central SciTech Premium Collection - hybrid linking Advanced Technologies & Aerospace Collection - hybrid linking ProQuest Central Student - hybrid linking ProQuest Central Essentials - hybrid linking ABI/INFORM Global (Alumni) - hybrid linking Business Premium Collection (Alumni) - hybrid linking Computer Science Database - hybrid linking ProQuest Women's & Gender Studies - hybrid linking Advanced Technologies & Aerospace Collection Agricultural & Environmental Science Collection ProQuest Central Essentials Biological Science Collection ProQuest Central Business Premium Collection Technology Collection Natural Science Collection ProQuest One Business - hybrid linking ProQuest One Business (Alumni) - hybrid linking ProQuest One Community College ProQuest Materials Science Collection ProQuest Central Korea Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student Research Library Prep SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database Materials Science Database ABI/INFORM Professional Advanced ProQuest Engineering Collection Biological Sciences ABI/INFORM Global Agriculture Science Database Computing Database Military Database ProQuest Research Library Science Database Telecommunications Database Biological Science Database Engineering Database Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Environmental Science Database Materials Science Collection One Business ProQuest One Business (Alumni) ProQuest Central - hybrid linking ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition Research Library - hybrid linking Engineering Collection Environmental Science Collection ProQuest Central Basic |
DatabaseTitle | Agricultural Science Database ProQuest Business Collection (Alumni Edition) Research Library Prep Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection SciTech Premium Collection ProQuest Military Collection ABI/INFORM Complete ProQuest Telecommunications Natural Science Collection Biological Science Collection Engineering Collection Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global Engineering Database ProQuest Science Journals (Alumni Edition) ProQuest Biological Science Collection ProQuest One Academic Eastern Edition Agricultural Science Collection ProQuest Technology Collection ProQuest Telecommunications (Alumni Edition) Biological Science Database ProQuest Business Collection Environmental Science Collection ProQuest One Academic UKI Edition Environmental Science Database ProQuest One Academic ABI/INFORM Global (Corporate) ProQuest One Business Technology Collection Materials Science Collection ProQuest Central (Alumni Edition) ProQuest One Community College Research Library (Alumni Edition) ProQuest Natural Science Collection ProQuest Biology Journals (Alumni Edition) ProQuest Central ABI/INFORM Professional Advanced ProQuest Engineering Collection ProQuest Central Korea Agricultural & Environmental Science Collection Materials Science Database ProQuest Research Library ABI/INFORM Complete (Alumni Edition) ProQuest Materials Science Collection ProQuest Computing ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest Computing (Alumni Edition) ProQuest Military Collection (Alumni Edition) ProQuest SciTech Collection Advanced Technologies & Aerospace Database Materials Science & Engineering Collection ProQuest One Business (Alumni) ProQuest Central (Alumni) Business Premium Collection (Alumni) |
DatabaseTitleList | Agricultural Science Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2711929751 |
Genre | Feature |
GroupedDBID | 3V. 7WY 7X2 7XB 88A 88I 88K 8AL 8FE 8FG 8FH 8FK 8FL 8G5 ABJCF ABUWG AFKRA ARAPS ATCPS AZQEC BBNVY BENPR BEZIV BGLVJ BHPHI CCPQU D1I DWQXO FRNLG GNUQQ GUQSH HCIFZ JQ2 K60 K6~ K7- KB. L.- L6V LK8 M0C M0K M0N M1Q M2O M2P M2T M7P M7S MBDVC P62 PATMY PDBOC PQBIZ PQBZA PQEST PQQKQ PQUKI PTHSS PYCSY Q9U |
ID | FETCH-proquest_journals_10257475153 |
IEDL.DBID | BENPR |
IngestDate | Thu Oct 10 21:02:16 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_10257475153 |
PQID | 1025747515 |
PQPubID | 676301 |
ParticipantIDs | proquest_journals_1025747515 |
PublicationCentury | 2000 |
PublicationDate | 20120101 |
PublicationDateYYYYMMDD | 2012-01-01 |
PublicationDate_xml | – month: 01 year: 2012 text: 20120101 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Philadelphia |
PublicationPlace_xml | – name: Philadelphia |
PublicationTitle | Society for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining |
PublicationYear | 2012 |
Publisher | Society for Industrial and Applied Mathematics |
Publisher_xml | – name: Society for Industrial and Applied Mathematics |
Score | 3.0141578 |
Snippet | The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities... |
SourceID | proquest |
SourceType | Aggregation Database |
StartPage | 1023 |
SubjectTerms | Algorithms Clustering Microbial activity |
Title | Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing |
URI | https://www.proquest.com/docview/1025747515 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB7s9uJNseKjloBeA3Wf2ZNg2XURuxQf0FvJUwRtrbv9_07SrD0IPQ-EZJjMNzP5MgNwkzGDUXSCkRuPOLUtuykPx4rmIs2EMolJHdt9WqfVW_w4T-a-4NZ4WmXnE52jVitpa-R4u9G44gzh9-57Te3UKPu66kdo9KAfYqYwDqB_X9Sz53-O1aFFeQSD3T86MvtDiGM40MsTqAvXtQGdPZl8bmyfApSRlSFT3XLbMvXrQ5KXjuFMLDH9nTxZyMGAGQXLxtF9SLUdgzSA67J4nVS028rCm0ez2B0mOoUA83x9BkTJjItchjYjiQW_ZUIx1BvTRqpIqPwchvtWutgvvoRDxPpwWz0YQtD-bPQV4mkrRtBj5cPIq-4XPZKDYw |
link.rule.ids | 310,311,783,787,792,793,12777,21400,33385,33756,43612,43817 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60HvSmWPFRdUGvC5p3Th5KY9QkCFboLexTBG3VpP_fmW1iD0LPA8vuMDvfzOy3MwDXcWIxig4xchO-4NSymwvvRvNURrHUNrSRY7uXVZS_Bo-zcNYV3JqOVtn7ROeo9UJRjRxvNxpXECP83n19c5oaRa-r3QiNbdgJfMRq-ime3f9zqw4rsn0Yrn_Rsec_fDiALTM_hGriejagq2fjjyV1KUAZW1hWmlZQw9TPd8Veen4zI1r6GysIcDBcRsG8cWQflq-GIA3hKptMxznvt1J3xtHU66P4RzDALN8cA9MqFjJVHuUjgRS3idQJai0xVmlf6vQERptWOt0svoTdfFoWdfFQPZ3BHqK-t6ojjGDQ_izNOSJrKy-c-n4BQv2ECA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Society+for+Industrial+and+Applied+Mathematics.+Proceedings+of+the+SIAM+International+Conference+on+Data+Mining&rft.atitle=Efficient+Clustering+of+Metagenomic+Sequences+using+Locality+Sensitive+Hashing&rft.au=Rasheed%2C+Zeehasham&rft.au=Rangwala%2C+Huzefa&rft.au=Barbar%C3%A1%2C+Daniel&rft.date=2012-01-01&rft.pub=Society+for+Industrial+and+Applied+Mathematics&rft.spage=1023&rft.externalDocID=2711929751 |