An accurate and fast alignment-free method for profiling microbial communities

Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to referen...

Full description

Saved in:
Bibliographic Details
Published inJournal of bioinformatics and computational biology Vol. 15; no. 3; p. 1740001
Main Authors Pham, Diem-Trang, Gao, Shanshan, Phan, Vinhthuy
Format Journal Article
LanguageEnglish
Published England 01.06.2017
Subjects
Online AccessGet more information

Cover

Loading…
Abstract Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods.
AbstractList Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods.
Author Gao, Shanshan
Phan, Vinhthuy
Pham, Diem-Trang
Author_xml – sequence: 1
  givenname: Diem-Trang
  surname: Pham
  fullname: Pham, Diem-Trang
  organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA
– sequence: 2
  givenname: Shanshan
  surname: Gao
  fullname: Gao, Shanshan
  organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA
– sequence: 3
  givenname: Vinhthuy
  surname: Phan
  fullname: Phan, Vinhthuy
  organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/28345370$$D View this record in MEDLINE/PubMed
BookMark eNo1j81KxDAcxIMouq4-gBfJC1Tznfa4LLoKix7U85KPf9ZAk5S0Pfj2VtTLzGF-zDCX6DSXDAjdUHJHqWD3b4TRTjNCqBbkR0_QimqpG8W5OEcXrOVCck1W6GWTsXFurmYCbLLHwYwTNn085gR5akIFwAmmz7JEpeKhlhD7mI84RVeLjabHrqQ05zhFGK_QWTD9CNd_vkYfjw_v26dm_7p73m72jVtWdUOZ976j3EuwGkLHrJAOQhDeS2-Io6ETDJjSuu3UwhkhLPGtElq1rbOcrdHtb-8w2wT-MNSYTP06_B9j33DwTbs
CitedBy_id crossref_primary_10_1186_s13059_017_1319_7
crossref_primary_10_1142_S0219720019500082
crossref_primary_10_1186_s12859_018_2155_9
crossref_primary_10_1142_S0219720017020024
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1142/S0219720017400017
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
EISSN 1757-6334
ExternalDocumentID 28345370
Genre Journal Article
GroupedDBID CGR
CUY
CVF
ECM
EIF
NPM
ID FETCH-LOGICAL-c3707-12ddd913d5eb7ef92b45ceff4dd5da0c1f942e2677896913a44b0d8647688cb32
IngestDate Fri May 24 00:05:33 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Metagenomics
genome-specific marker
abundance profiling
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c3707-12ddd913d5eb7ef92b45ceff4dd5da0c1f942e2677896913a44b0d8647688cb32
PMID 28345370
ParticipantIDs pubmed_primary_28345370
PublicationCentury 2000
PublicationDate 2017-Jun
PublicationDateYYYYMMDD 2017-06-01
PublicationDate_xml – month: 06
  year: 2017
  text: 2017-Jun
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Journal of bioinformatics and computational biology
PublicationTitleAlternate J Bioinform Comput Biol
PublicationYear 2017
Score 2.1707025
Snippet Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are...
SourceID pubmed
SourceType Index Database
StartPage 1740001
SubjectTerms Databases, Genetic
Genetic Markers
Genome
Metagenomics - methods
Microbial Consortia - genetics
Sequence Analysis, DNA - methods
Title An accurate and fast alignment-free method for profiling microbial communities
URI https://www.ncbi.nlm.nih.gov/pubmed/28345370
Volume 15
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUKXLggEPsmH7ghQ5N4SY4VixASFRIt4oa8hebQtIL2AOLjGTtLwyrgElWx06R5rzPjsf0GoQMVUiukjEiiIk2opZTEKmZEcaO4c0hCuoHiVZdf9OnlHbtrtV6bu0sm6ki_fLmv5D-owjnA1e2S_QOy9ZfCCfgM-MIREIbjrzDu5IdS66lTe_CTAKl8chnc7MFP8ZP00dqyRLRfTVjU53a5gWHm9Ze8NojfIDKp1hJ-jlNVNirVVWtFZ-1LQVRpxFLHaTa_UHDsNLND4lzhQ73GR_q87M0A3ONgxkq4wJu-2ywfTAbT52YiIhCzBVNHtjCeggnCozI5WVlX1mBR1DCVMBRyAebXZpyGfiI5dEXRnCP1XUWzLyAxHnpcIUCiLCrKj_zc-kFZu2qaQ3Midtaxe31VTnzDAxx_ur0Tji4v-TAI8cFIbxktlejgTkGJFdSy-SrqdnJc0QEDTNjRAb-nAy7ogAFPXNMB13TADTqsof75We_kgpTlMoiGJxIkCI0xSRAZZpWwaRIqyrRNU2oMM7KtgzShoQ2dYmDCoZ-kVLVNzCmMOGOtonAdzeej3G4inMaGMcMTDu4QOtlYt2USKIhedaR5W22hjeLn348LTZT76sVsf9uygxZnrNlFCyn8Ce0eRHQTte9f_RulFU3G
link.rule.ids 786
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+accurate+and+fast+alignment-free+method+for+profiling+microbial+communities&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Pham%2C+Diem-Trang&rft.au=Gao%2C+Shanshan&rft.au=Phan%2C+Vinhthuy&rft.date=2017-06-01&rft.eissn=1757-6334&rft.volume=15&rft.issue=3&rft.spage=1740001&rft_id=info:doi/10.1142%2FS0219720017400017&rft_id=info%3Apmid%2F28345370&rft_id=info%3Apmid%2F28345370&rft.externalDocID=28345370