Fulcrum: condensing redundant reads from high-throughput sequencing studies

Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse ident...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 28; no. 10; pp. 1324 - 1327
Main Authors Burriesci, Matthew S., Lehnert, Erik M., Pringle, John R.
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.05.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory. Availability and implementation: Source code and a tutorial are available at http://pringlelab.stanford.edu/protocols.html under a BSD-like license. Fulcrum was written and tested in Python 2.6, and the single-machine and local-network modes depend on a modified version of the Parallel Python library (provided). Contact:  erik.m.lehnert@gmail.com Supplementary information:  Supplementary information is available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Associate Editor: Alex Bateman
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/bts123