An accurate and fast alignment-free method for profiling microbial communities

Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to referen...

Full description

Saved in:
Bibliographic Details
Published inJournal of bioinformatics and computational biology Vol. 15; no. 3; p. 1740001
Main Authors Pham, Diem-Trang, Gao, Shanshan, Phan, Vinhthuy
Format Journal Article
LanguageEnglish
Published England 01.06.2017
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods.
ISSN:1757-6334
DOI:10.1142/S0219720017400017