Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing

The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes...

Full description

Saved in:
Bibliographic Details
Published inSociety for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining p. 1023
Main Authors Rasheed, Zeehasham, Rangwala, Huzefa, Barbará, Daniel
Format Conference Proceeding
LanguageEnglish
Published Philadelphia Society for Industrial and Applied Mathematics 01.01.2012
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we developed an efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences. We introduce the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function. We evaluate the performance of our algorithm on two metagenome datasets associated with microbes existing across different human skin locations. Our empirical results show the strength of the developed approach in comparison to three state-of-the-art sequence clustering algorithms with regards to computational efficiency and clustering quality. We also demonstrate practical significance for the developed clustering algorithm, to compare bacterial diversity and structure across different skin locations. [PUBLICATION ABSTRACT]
AbstractList The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we developed an efficient and accurate metagenome clustering approach that uses the locality sensitive hashing (LSH) technique to approximate the computational complexity associated with comparing sequences. We introduce the use of fixed-length, gapless subsequences for improving the sensitivity of the LSH-based similarity function. We evaluate the performance of our algorithm on two metagenome datasets associated with microbes existing across different human skin locations. Our empirical results show the strength of the developed approach in comparison to three state-of-the-art sequence clustering algorithms with regards to computational efficiency and clustering quality. We also demonstrate practical significance for the developed clustering algorithm, to compare bacterial diversity and structure across different skin locations. [PUBLICATION ABSTRACT]
Author Barbará, Daniel
Rasheed, Zeehasham
Rangwala, Huzefa
Author_xml – sequence: 1
  givenname: Zeehasham
  surname: Rasheed
  fullname: Rasheed, Zeehasham
– sequence: 2
  givenname: Huzefa
  surname: Rangwala
  fullname: Rangwala, Huzefa
– sequence: 3
  givenname: Daniel
  surname: Barbará
  fullname: Barbará, Daniel
BookMark eNrjYmDJy89L5WTwc01Ly0zOTM0rUXDOKS0uSS3KzEtXyE9T8E0tSUxPzcvPzUxWCE4tLE3NS04tVigtBkn75Ccn5mSWVAIl8oozSzLLUhU8EoszgFI8DKxpiTnFqbxQmptB2c01xNlDt6AoH2hGcUl8Vn5pUR5QKt7QwMjU3MTc1NDUmDhVAHRROyE
ContentType Conference Proceeding
Copyright Copyright Society for Industrial and Applied Mathematics 2012
Copyright_xml – notice: Copyright Society for Industrial and Applied Mathematics 2012
DBID 3V.
7WY
7WZ
7X2
7XB
87Z
88A
88F
88I
88K
8AL
8FE
8FG
8FH
8FK
8FL
8G5
AAFGM
AAMXL
ABJCF
ABLUL
ABOIG
ABPUF
ABQRF
ABRGS
ABSSA
ABUWG
ACIOU
ADSMB
ADZZV
AEEYA
AEQXP
AFKRA
AFLLJ
AFOKG
AFOLM
AGAJT
AGSBL
AJNOY
ANJRB
AQTIP
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BEZIV
BGLVJ
BHPHI
BOUDT
CBHQV
CCPQU
D1I
DWQXO
FRNLG
F~G
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
KB.
L.-
L6V
LK8
M0C
M0K
M0N
M1Q
M2O
M2P
M2T
M7P
M7S
MBDVC
P5Z
P62
PATMY
PDBOC
PQBIZ
PQBZA
PQCXX
PQEST
PQQKQ
PQUKI
PRLXX
PTHSS
PYCSY
Q9U
DatabaseName ProQuest Central (Corporate)
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
Agricultural Science Collection
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Biology Database (Alumni Edition)
Military Database (Alumni Edition)
Science Database (Alumni Edition)
Telecommunications (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
Research Library (Alumni Edition)
ProQuest Central Korea - hybrid linking
Natural Science Collection - hybrid linking
Materials Science & Engineering Collection
Business Premium Collection - hybrid linking
Biological Science Collection - hybrid linking
ABI/INFORM Collection (Alumni) - hybrid linking
Technology Collection - hybrid linking
Materials Science & Engineering Collection - hybrid linking
ABI/INFORM Collection - hybrid linking
ProQuest Central (Alumni)
ABI/INFORM Global - hybrid linking
Military Database - hybrid linking
ProQuest Central (Alumni) - hybrid linking
Environmental Science Collection - hybrid linking
Military Database (Alumni) - hybrid linking
ProQuest Central
SciTech Premium Collection - hybrid linking
Advanced Technologies & Aerospace Collection - hybrid linking
ProQuest Central Student - hybrid linking
ProQuest Central Essentials - hybrid linking
ABI/INFORM Global (Alumni) - hybrid linking
Business Premium Collection (Alumni) - hybrid linking
Computer Science Database - hybrid linking
ProQuest Women's & Gender Studies - hybrid linking
Advanced Technologies & Aerospace Collection
Agricultural & Environmental Science Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Business Premium Collection
Technology Collection
Natural Science Collection
ProQuest One Business - hybrid linking
ProQuest One Business (Alumni) - hybrid linking
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central Korea
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
Research Library Prep
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
Materials Science Database
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
Biological Sciences
ABI/INFORM Global
Agriculture Science Database
Computing Database
Military Database
ProQuest Research Library
Science Database
Telecommunications Database
Biological Science Database
Engineering Database
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Environmental Science Database
Materials Science Collection
One Business
ProQuest One Business (Alumni)
ProQuest Central - hybrid linking
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
Research Library - hybrid linking
Engineering Collection
Environmental Science Collection
ProQuest Central Basic
DatabaseTitle Agricultural Science Database
ProQuest Business Collection (Alumni Edition)
Research Library Prep
Computer Science Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
SciTech Premium Collection
ProQuest Military Collection
ABI/INFORM Complete
ProQuest Telecommunications
Natural Science Collection
Biological Science Collection
Engineering Collection
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
Engineering Database
ProQuest Science Journals (Alumni Edition)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Agricultural Science Collection
ProQuest Technology Collection
ProQuest Telecommunications (Alumni Edition)
Biological Science Database
ProQuest Business Collection
Environmental Science Collection
ProQuest One Academic UKI Edition
Environmental Science Database
ProQuest One Academic
ABI/INFORM Global (Corporate)
ProQuest One Business
Technology Collection
Materials Science Collection
ProQuest Central (Alumni Edition)
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Natural Science Collection
ProQuest Biology Journals (Alumni Edition)
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
ProQuest Central Korea
Agricultural & Environmental Science Collection
Materials Science Database
ProQuest Research Library
ABI/INFORM Complete (Alumni Edition)
ProQuest Materials Science Collection
ProQuest Computing
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest Military Collection (Alumni Edition)
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
Materials Science & Engineering Collection
ProQuest One Business (Alumni)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList Agricultural Science Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2711929751
Genre Feature
GroupedDBID 3V.
7WY
7X2
7XB
88A
88I
88K
8AL
8FE
8FG
8FH
8FK
8FL
8G5
ABJCF
ABUWG
AFKRA
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BEZIV
BGLVJ
BHPHI
CCPQU
D1I
DWQXO
FRNLG
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
KB.
L.-
L6V
LK8
M0C
M0K
M0N
M1Q
M2O
M2P
M2T
M7P
M7S
MBDVC
P62
PATMY
PDBOC
PQBIZ
PQBZA
PQEST
PQQKQ
PQUKI
PTHSS
PYCSY
Q9U
ID FETCH-proquest_journals_10257475153
IEDL.DBID BENPR
IngestDate Thu Oct 10 21:02:16 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_10257475153
PQID 1025747515
PQPubID 676301
ParticipantIDs proquest_journals_1025747515
PublicationCentury 2000
PublicationDate 20120101
PublicationDateYYYYMMDD 2012-01-01
PublicationDate_xml – month: 01
  year: 2012
  text: 20120101
  day: 01
PublicationDecade 2010
PublicationPlace Philadelphia
PublicationPlace_xml – name: Philadelphia
PublicationTitle Society for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining
PublicationYear 2012
Publisher Society for Industrial and Applied Mathematics
Publisher_xml – name: Society for Industrial and Applied Mathematics
Score 3.0141578
Snippet The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities...
SourceID proquest
SourceType Aggregation Database
StartPage 1023
SubjectTerms Algorithms
Clustering
Microbial activity
Title Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing
URI https://www.proquest.com/docview/1025747515
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB7s9uJNseKjloBeA3Wf2ZNg2XURuxQf0FvJUwRtrbv9_07SrD0IPQ-EZJjMNzP5MgNwkzGDUXSCkRuPOLUtuykPx4rmIs2EMolJHdt9WqfVW_w4T-a-4NZ4WmXnE52jVitpa-R4u9G44gzh9-57Te3UKPu66kdo9KAfYqYwDqB_X9Sz53-O1aFFeQSD3T86MvtDiGM40MsTqAvXtQGdPZl8bmyfApSRlSFT3XLbMvXrQ5KXjuFMLDH9nTxZyMGAGQXLxtF9SLUdgzSA67J4nVS028rCm0ez2B0mOoUA83x9BkTJjItchjYjiQW_ZUIx1BvTRqpIqPwchvtWutgvvoRDxPpwWz0YQtD-bPQV4mkrRtBj5cPIq-4XPZKDYw
link.rule.ids 310,311,783,787,792,793,12777,21400,33385,33756,43612,43817
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60HvSmWPFRdUGvC5p3Th5KY9QkCFboLexTBG3VpP_fmW1iD0LPA8vuMDvfzOy3MwDXcWIxig4xchO-4NSymwvvRvNURrHUNrSRY7uXVZS_Bo-zcNYV3JqOVtn7ROeo9UJRjRxvNxpXECP83n19c5oaRa-r3QiNbdgJfMRq-ime3f9zqw4rsn0Yrn_Rsec_fDiALTM_hGriejagq2fjjyV1KUAZW1hWmlZQw9TPd8Veen4zI1r6GysIcDBcRsG8cWQflq-GIA3hKptMxznvt1J3xtHU66P4RzDALN8cA9MqFjJVHuUjgRS3idQJai0xVmlf6vQERptWOt0svoTdfFoWdfFQPZ3BHqK-t6ojjGDQ_izNOSJrKy-c-n4BQv2ECA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Society+for+Industrial+and+Applied+Mathematics.+Proceedings+of+the+SIAM+International+Conference+on+Data+Mining&rft.atitle=Efficient+Clustering+of+Metagenomic+Sequences+using+Locality+Sensitive+Hashing&rft.au=Rasheed%2C+Zeehasham&rft.au=Rangwala%2C+Huzefa&rft.au=Barbar%C3%A1%2C+Daniel&rft.date=2012-01-01&rft.pub=Society+for+Industrial+and+Applied+Mathematics&rft.spage=1023&rft.externalDocID=2711929751