Error filtering, pair assembly and error correction for next-generation sequencing reads

Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. Results: We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i)...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 31; no. 21; pp. 3476 - 3482
Main Authors	Edgar, Robert C., Flyvbjerg, Henrik
Format	Journal Article
Language	English
Published	England 01.11.2015
Subjects	Algorithms Assembly Bioinformatics Error correction Filtering Filtration High-Throughput Nucleotide Sequencing - methods Mathematical analysis Packages Sequencing Software
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. Results: We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i) filtering reads according to their expected number of errors, (ii) assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique sequence abundances to perform error correction. We also show that most published paired read assemblers calculate incorrect posterior quality scores. Availability and implementation: These methods are implemented in the USEARCH package. Binaries are freely available at http://drive5.com/usearch. Contact: robert@drive5.com Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	1367-4803 1367-4811 1367-4811 1460-2059
DOI:	10.1093/bioinformatics/btv401