A shotgun approach to discovering and reconstructing consensus retrotransposons ex novo from dense contigs of short sequences derived from Genbank Genome Survey Sequence database records

Retrotransposons constitute the majority of pseudogenic protein coding regions of most eukaryotic genomes. Most genomes carry tens to thousands of retrotransposon copies derived from dozens of distinct families, but most if not all of these copies are non-functional and contain disabling mutations,...

Full description

Saved in:

Bibliographic Details
Published in	Gene Vol. 448; no. 2; pp. 168 - 173
Main Authors	Laten, Howard M., Mogil, Lauren S., Wright, LaBianca N.
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 15.12.2009
Subjects	Base Sequence Cloning, Molecular - methods Computational Biology - methods Consensus Sequence - genetics Contig Mapping Databases, Nucleic Acid Models, Biological Molecular Sequence Data Open Reading Frames - genetics Retroelements - genetics PBS Prot ORF RT PPT GSS RH HTGS LTR Int NT/NR
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Retrotransposons constitute the majority of pseudogenic protein coding regions of most eukaryotic genomes. Most genomes carry tens to thousands of retrotransposon copies derived from dozens of distinct families, but most if not all of these copies are non-functional and contain disabling mutations, including large numbers of indels. Until recently, most regions rich in these elements were virtually ignored in all but the most complete genome sequencing projects, and the full extent of their impact on the structure and function of the genomes of higher eukaryotes was under-appreciated. Even when new retrotransposons are encountered and annotated by automated gene finding programs and similarity searches, coding regions are treated as exons and invariably and not surprisingly mistranslated because of numerous frameshift mutations and large indels. Very few functional retrotransposons contain introns, as in silico annotations imply. While many repetitive DNA consensus sequences have been assembled from collections of largely full-length copies using full-length templates, we have shown that repetitive DNA consensus sequence contigs representing long, moderately high copy-number elements can also be generated ex novo in the absence of templates from very short overlapping sequences. We have devised an in silico strategy to recover and reconstruct consensus sequences of elements up to 20,000 bp by building dense contigs of hundreds of overlapping 400 to 900-bp records found in the Genbank Genome Survey Sequence database. The results are hypothetical ancestral sequences that encode elements that appear to be fully functional with intact open reading frames and other conserved features.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0378-1119 1879-0038
DOI:	10.1016/j.gene.2009.06.011