Mining NCBI Sequence Read Archive Database: An Untapped Source of Organelle Genomes for Taxonomic and Comparative Genomics Research

The NCBI SRA database is constantly expanding due to the large amount of genomic and transcriptomic data from various organisms generated by next-generation sequencing, and re-searchers worldwide regularly deposit new data into the database. This high-coverage genomic and transcriptomic information...

Full description

Saved in:

Bibliographic Details
Published in	Diversity (Basel) Vol. 16; no. 2; p. 104
Main Authors	Eldem, Vahap, Balcı, Mehmet Ali
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.02.2024
Subjects	Archives & records Automation Bioinformatics Biology chloroplast genome Chloroplasts Data mining Gene sequencing Genetics Genomes Genomics Identification and classification Invertebrates Metadata Methods mitochondria Mitochondrial DNA mitochondrial genome Mollusks Next-generation sequencing organelle genome Organelles Organisms Phenetics Plant species Population genetics sequence read archive Software utilities Sperm systematics Taxonomy Transcriptomics Vertebrates Workflow Turkey
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The NCBI SRA database is constantly expanding due to the large amount of genomic and transcriptomic data from various organisms generated by next-generation sequencing, and re-searchers worldwide regularly deposit new data into the database. This high-coverage genomic and transcriptomic information can be re-evaluated regardless of the original research subject. The database-deposited NGS data can offer valuable insights into the genomes of organelles, particularly for non-model organisms. Here, we developed an automated bioinformatics workflow called “OrgaMiner”, designed to unveil high-quality mitochondrial and chloroplast genomes by data mining the NCBI SRA database. OrgaMiner, a Python-based pipeline, automatically orchestrates various tools to extract, assemble, and annotate organelle genomes for non-model organisms without available organelle genome sequences but with data in the NCBI SRA. To test the usability and feasibility of the pipeline, “mollusca” was selected as a keyword, and 76 new mitochondrial genomes were de novo assembled and annotated automatically without writing one single code. The applicability of the pipeline can be expanded to identify organelles in diverse invertebrate, vertebrate, and plant species by simply specifying the taxonomic name. OrgaMiner provides an easy-to-use, end-to-end solution for biologists mainly working with taxonomy and population genetics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1424-2818 1424-2818
DOI:	10.3390/d16020104