Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies

During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 2; no. 1; p. 150014
Main Authors Utturkar, Sagar M, Klingeman, Dawn M, Bruno-Barcena, José M, Chinn, Mari S, Grunden, Amy M, Köpke, Michael, Brown, Steven D
Format Journal Article
LanguageEnglish
Published England Nature Publishing Group 14.04.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Undefined-3
USDOE
AC05-00OR22725
D.M.K. generated the Illumina sequence data. S.M.U. and S.D.B. were responsible for analyzing the PacBio and Sanger data. M.K. was responsible for generating the 454 PE data. J.B., M.C. and A.G. were responsible for generating and analyzing the 454 shotgun and Ion Torrent sequencing datasets. S.M.U. deposited data to the S.R.A. and analysed data. S.D.B. coordinated the project. All authors contributed to the manuscript and approved the final version.
ISSN:2052-4463
2052-4463
DOI:10.1038/sdata.2015.14