An accurate and fast alignment-free method for profiling microbial communities

Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to referen...

Full description

Saved in:

Bibliographic Details
Published in	Journal of bioinformatics and computational biology Vol. 15; no. 3; p. 1740001
Main Authors	Pham, Diem-Trang, Gao, Shanshan, Phan, Vinhthuy
Format	Journal Article
Language	English
Published	England 01.06.2017
Subjects	Databases, Genetic Genetic Markers Genome Metagenomics - methods Microbial Consortia - genetics Sequence Analysis, DNA - methods Metagenomics genome-specific marker abundance profiling
Online Access	Get more information

Cover

Loading…

Abstract	Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods.
AbstractList	Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods.
Author	Gao, Shanshan Phan, Vinhthuy Pham, Diem-Trang
Author_xml	– sequence: 1 givenname: Diem-Trang surname: Pham fullname: Pham, Diem-Trang organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA – sequence: 2 givenname: Shanshan surname: Gao fullname: Gao, Shanshan organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA – sequence: 3 givenname: Vinhthuy surname: Phan fullname: Phan, Vinhthuy organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/28345370$$D View this record in MEDLINE/PubMed
BookMark	eNo1j81KxDAcxIMouq4-gBfJC1Tznfa4LLoKix7U85KPf9ZAk5S0Pfj2VtTLzGF-zDCX6DSXDAjdUHJHqWD3b4TRTjNCqBbkR0_QimqpG8W5OEcXrOVCck1W6GWTsXFurmYCbLLHwYwTNn085gR5akIFwAmmz7JEpeKhlhD7mI84RVeLjabHrqQ05zhFGK_QWTD9CNd_vkYfjw_v26dm_7p73m72jVtWdUOZ976j3EuwGkLHrJAOQhDeS2-Io6ETDJjSuu3UwhkhLPGtElq1rbOcrdHtb-8w2wT-MNSYTP06_B9j33DwTbs
CitedBy_id	crossref_primary_10_1186_s13059_017_1319_7 crossref_primary_10_1142_S0219720019500082 crossref_primary_10_1186_s12859_018_2155_9 crossref_primary_10_1142_S0219720017020024
ContentType	Journal Article
DBID	CGR CUY CVF ECM EIF NPM
DOI	10.1142/S0219720017400017
DatabaseName	Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed
DatabaseTitle	MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid)
DatabaseTitleList	MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	no_fulltext_linktorsrc
EISSN	1757-6334
ExternalDocumentID	28345370
Genre	Journal Article
GroupedDBID	CGR CUY CVF ECM EIF NPM
ID	FETCH-LOGICAL-c3707-12ddd913d5eb7ef92b45ceff4dd5da0c1f942e2677896913a44b0d8647688cb32
IngestDate	Fri May 24 00:05:33 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	3
Keywords	Metagenomics genome-specific marker abundance profiling
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c3707-12ddd913d5eb7ef92b45ceff4dd5da0c1f942e2677896913a44b0d8647688cb32
PMID	28345370
ParticipantIDs	pubmed_primary_28345370
PublicationCentury	2000
PublicationDate	2017-Jun
PublicationDateYYYYMMDD	2017-06-01
PublicationDate_xml	– month: 06 year: 2017 text: 2017-Jun
PublicationDecade	2010
PublicationPlace	England
PublicationPlace_xml	– name: England
PublicationTitle	Journal of bioinformatics and computational biology
PublicationTitleAlternate	J Bioinform Comput Biol
PublicationYear	2017
Score	2.1707025
Snippet	Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are...
SourceID	pubmed
SourceType	Index Database
StartPage	1740001
SubjectTerms	Databases, Genetic Genetic Markers Genome Metagenomics - methods Microbial Consortia - genetics Sequence Analysis, DNA - methods
Title	An accurate and fast alignment-free method for profiling microbial communities
URI	https://www.ncbi.nlm.nih.gov/pubmed/28345370
Volume	15
hasFullText
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUKXLggEPsmH7ghQ5N4SY4VixASFRIt4oa8hebQtIL2AOLjGTtLwyrgElWx06R5rzPjsf0GoQMVUiukjEiiIk2opZTEKmZEcaO4c0hCuoHiVZdf9OnlHbtrtV6bu0sm6ki_fLmv5D-owjnA1e2S_QOy9ZfCCfgM-MIREIbjrzDu5IdS66lTe_CTAKl8chnc7MFP8ZP00dqyRLRfTVjU53a5gWHm9Ze8NojfIDKp1hJ-jlNVNirVVWtFZ-1LQVRpxFLHaTa_UHDsNLND4lzhQ73GR_q87M0A3ONgxkq4wJu-2ywfTAbT52YiIhCzBVNHtjCeggnCozI5WVlX1mBR1DCVMBRyAebXZpyGfiI5dEXRnCP1XUWzLyAxHnpcIUCiLCrKj_zc-kFZu2qaQ3Midtaxe31VTnzDAxx_ur0Tji4v-TAI8cFIbxktlejgTkGJFdSy-SrqdnJc0QEDTNjRAb-nAy7ogAFPXNMB13TADTqsof75We_kgpTlMoiGJxIkCI0xSRAZZpWwaRIqyrRNU2oMM7KtgzShoQ2dYmDCoZ-kVLVNzCmMOGOtonAdzeej3G4inMaGMcMTDu4QOtlYt2USKIhedaR5W22hjeLn348LTZT76sVsf9uygxZnrNlFCyn8Ce0eRHQTte9f_RulFU3G
link.rule.ids	786
linkProvider	National Library of Medicine
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+accurate+and+fast+alignment-free+method+for+profiling+microbial+communities&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Pham%2C+Diem-Trang&rft.au=Gao%2C+Shanshan&rft.au=Phan%2C+Vinhthuy&rft.date=2017-06-01&rft.eissn=1757-6334&rft.volume=15&rft.issue=3&rft.spage=1740001&rft_id=info:doi/10.1142%2FS0219720017400017&rft_id=info%3Apmid%2F28345370&rft_id=info%3Apmid%2F28345370&rft.externalDocID=28345370