Dinoflagellate tandem array gene transcripts are highly conserved and not polycistronic

Dinoflagellates are an important component of the marine biota, but a large genome with high–copy number (up to 5,000) tandem gene arrays has made genomic sequencing problematic. More importantly, little is known about the expression and conservation of these unusual gene arrays. We assembled de nov...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the National Academy of Sciences - PNAS Vol. 109; no. 39; pp. 15793 - 15798
Main Authors Beauchemin, Mathieu, Roy, Sougata, Daoust, Philippe, Dagenais-Bellefeuille, Steve, Bertomeu, Thierry, Letourneau, Louis, Lang, B. Franz, Morse, David
Format Journal Article
LanguageEnglish
Published United States National Academy of Sciences 25.09.2012
National Acad Sciences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Dinoflagellates are an important component of the marine biota, but a large genome with high–copy number (up to 5,000) tandem gene arrays has made genomic sequencing problematic. More importantly, little is known about the expression and conservation of these unusual gene arrays. We assembled de novo a gene catalog of 74,655 contigs for the dinoflagellate Lingulodinium polyedrum from RNA-Seq (Illumina) reads. The catalog contains 93% of a Lingulodinium EST dataset deposited in GenBank and 94% of the enzymes in 16 primary metabolic KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, indicating it is a good representation of the transcriptome. Analysis of the catalog shows a marked underrepresentation of DNA-binding proteins and DNA-binding domains compared with other algae. Despite this, we found no evidence to support the proposal of polycistronic transcription, including a marked underrepresentation of sequences corresponding to the intergenic spacers of two tandem array genes. We also have used RNA-Seq to assess the degree of sequence conservation in tandem array genes and found their transcripts to be highly conserved. Interestingly, some of the sequences in the catalog have only bacterial homologs and are potential candidates for horizontal gene transfer. These presumably were transferred as single-copy genes, and because they are now all GC-rich, any derived from AT-rich contexts must have experienced extensive mutation. Our study not only has provided the most complete dinoflagellate gene catalog known to date, it has also exploited RNA-Seq to address fundamental issues in basic transcription mechanisms and sequence conservation in these algae.
Bibliography:http://dx.doi.org/10.1073/pnas.1206683109
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
2Present Address: Centre d'Innovation Génome Québec, McGill University, Montreal, QC, Canada H3A 1A4.
1M.B. and S.R. contributed equally to this work.
Edited* by J. Woodland Hastings, Harvard University, Cambridge, MA, and approved August 17, 2012 (received for review April 23, 2012)
Author contributions: M.B., S.R., T.B., and D.M. designed research; M.B., S.R., P.D., and S.D.-B. performed research; P.D. contributed new reagents/analytic tools; M.B., S.R., S.D.-B., L.L., B.F.L., and D.M. analyzed data; and M.B., S.R., B.F.L., and D.M. wrote the paper.
3Present address: Pathology Department, Beth Israel Deaconess Medical Center, Harvard Medical School, Cambridge, MA 02215.
ISSN:0027-8424
1091-6490
DOI:10.1073/pnas.1206683109