An accurate and fast alignment-free method for profiling microbial communities
Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to referen...
Saved in:
Published in | Journal of bioinformatics and computational biology Vol. 15; no. 3; p. 1740001 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
England
01.06.2017
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Abstract | Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods. |
---|---|
AbstractList | Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods. |
Author | Gao, Shanshan Phan, Vinhthuy Pham, Diem-Trang |
Author_xml | – sequence: 1 givenname: Diem-Trang surname: Pham fullname: Pham, Diem-Trang organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA – sequence: 2 givenname: Shanshan surname: Gao fullname: Gao, Shanshan organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA – sequence: 3 givenname: Vinhthuy surname: Phan fullname: Phan, Vinhthuy organization: 1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/28345370$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j81KxDAcxIMouq4-gBfJC1Tznfa4LLoKix7U85KPf9ZAk5S0Pfj2VtTLzGF-zDCX6DSXDAjdUHJHqWD3b4TRTjNCqBbkR0_QimqpG8W5OEcXrOVCck1W6GWTsXFurmYCbLLHwYwTNn085gR5akIFwAmmz7JEpeKhlhD7mI84RVeLjabHrqQ05zhFGK_QWTD9CNd_vkYfjw_v26dm_7p73m72jVtWdUOZ976j3EuwGkLHrJAOQhDeS2-Io6ETDJjSuu3UwhkhLPGtElq1rbOcrdHtb-8w2wT-MNSYTP06_B9j33DwTbs |
CitedBy_id | crossref_primary_10_1186_s13059_017_1319_7 crossref_primary_10_1142_S0219720019500082 crossref_primary_10_1186_s12859_018_2155_9 crossref_primary_10_1142_S0219720017020024 |
ContentType | Journal Article |
DBID | CGR CUY CVF ECM EIF NPM |
DOI | 10.1142/S0219720017400017 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
EISSN | 1757-6334 |
ExternalDocumentID | 28345370 |
Genre | Journal Article |
GroupedDBID | CGR CUY CVF ECM EIF NPM |
ID | FETCH-LOGICAL-c3707-12ddd913d5eb7ef92b45ceff4dd5da0c1f942e2677896913a44b0d8647688cb32 |
IngestDate | Fri May 24 00:05:33 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Keywords | Metagenomics genome-specific marker abundance profiling |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c3707-12ddd913d5eb7ef92b45ceff4dd5da0c1f942e2677896913a44b0d8647688cb32 |
PMID | 28345370 |
ParticipantIDs | pubmed_primary_28345370 |
PublicationCentury | 2000 |
PublicationDate | 2017-Jun |
PublicationDateYYYYMMDD | 2017-06-01 |
PublicationDate_xml | – month: 06 year: 2017 text: 2017-Jun |
PublicationDecade | 2010 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Journal of bioinformatics and computational biology |
PublicationTitleAlternate | J Bioinform Comput Biol |
PublicationYear | 2017 |
Score | 2.1707025 |
Snippet | Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 1740001 |
SubjectTerms | Databases, Genetic Genetic Markers Genome Metagenomics - methods Microbial Consortia - genetics Sequence Analysis, DNA - methods |
Title | An accurate and fast alignment-free method for profiling microbial communities |
URI | https://www.ncbi.nlm.nih.gov/pubmed/28345370 |
Volume | 15 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUKXLggEPsmH7ghQ5N4SY4VixASFRIt4oa8hebQtIL2AOLjGTtLwyrgElWx06R5rzPjsf0GoQMVUiukjEiiIk2opZTEKmZEcaO4c0hCuoHiVZdf9OnlHbtrtV6bu0sm6ki_fLmv5D-owjnA1e2S_QOy9ZfCCfgM-MIREIbjrzDu5IdS66lTe_CTAKl8chnc7MFP8ZP00dqyRLRfTVjU53a5gWHm9Ze8NojfIDKp1hJ-jlNVNirVVWtFZ-1LQVRpxFLHaTa_UHDsNLND4lzhQ73GR_q87M0A3ONgxkq4wJu-2ywfTAbT52YiIhCzBVNHtjCeggnCozI5WVlX1mBR1DCVMBRyAebXZpyGfiI5dEXRnCP1XUWzLyAxHnpcIUCiLCrKj_zc-kFZu2qaQ3Midtaxe31VTnzDAxx_ur0Tji4v-TAI8cFIbxktlejgTkGJFdSy-SrqdnJc0QEDTNjRAb-nAy7ogAFPXNMB13TADTqsof75We_kgpTlMoiGJxIkCI0xSRAZZpWwaRIqyrRNU2oMM7KtgzShoQ2dYmDCoZ-kVLVNzCmMOGOtonAdzeej3G4inMaGMcMTDu4QOtlYt2USKIhedaR5W22hjeLn348LTZT76sVsf9uygxZnrNlFCyn8Ce0eRHQTte9f_RulFU3G |
link.rule.ids | 786 |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+accurate+and+fast+alignment-free+method+for+profiling+microbial+communities&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Pham%2C+Diem-Trang&rft.au=Gao%2C+Shanshan&rft.au=Phan%2C+Vinhthuy&rft.date=2017-06-01&rft.eissn=1757-6334&rft.volume=15&rft.issue=3&rft.spage=1740001&rft_id=info:doi/10.1142%2FS0219720017400017&rft_id=info%3Apmid%2F28345370&rft_id=info%3Apmid%2F28345370&rft.externalDocID=28345370 |