A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis

Genome-wide resources, such as collections of cDNA clones encoding for complete proteins (full-ORF clones), are crucial tools for studying the evolution of gene function and genetic interactions. Non-model organisms, in particular marine organisms, provide a rich source of functional diversity. Mari...

Full description

Saved in:
Bibliographic Details
Published inDevelopmental biology Vol. 404; no. 2; pp. 149 - 163
Main Authors Gilchrist, Michael J., Sobral, Daniel, Khoueiry, Pierre, Daian, Fabrice, Laporte, Batiste, Patrushev, Ilya, Matsumoto, Jun, Dewar, Ken, Hastings, Kenneth E.M., Satou, Yutaka, Lemaire, Patrick, Rothbächer, Ute
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 15.08.2015
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Genome-wide resources, such as collections of cDNA clones encoding for complete proteins (full-ORF clones), are crucial tools for studying the evolution of gene function and genetic interactions. Non-model organisms, in particular marine organisms, provide a rich source of functional diversity. Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes. The construction of full-ORF clone collections from non-model organisms is hindered by the difficulty of predicting accurately the N-terminal ends of proteins, and distinguishing recent paralogs from highly polymorphic alleles. We report a computational strategy that overcomes these difficulties, and allows for accurate gene level clustering of transcript data followed by the automated identification of full-ORFs with correct 5′- and 3′-ends. It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms. We developed this pipeline for the ascidian Ciona intestinalis, a highly polymorphic member of the divergent sister group of the vertebrates, emerging as a powerful model organism to study chordate gene function, Gene Regulatory Networks and molecular mechanisms underlying human pathologies. Using this pipeline we have generated the first full-ORF collection for a highly polymorphic marine invertebrate. It contains 19,163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes. [Display omitted] •Ascidians foster functional genomics by compact genomes and fixed cellular lineages.•A resource of 19.000 GATEWAY full ORF clones was generated for Ciona intestinalis.•Novel methods support automated finding of coding 5′ ends and paralog distinction.•The strategy is robust to polymorphism and poorly annotated genomes.•Half of human disease associated genes are covered by full ORF Ciona orthologs.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
AC02-05CH11231
USDOE Office of Science (SC)
Present address: Department of Evolution and Developmental Biology, Zoological Institute, University Innsbruck, Technikerstr. 25, A-6020 Innsbruck, Austria.
Present address: The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway, Mill Hill, London NW7 1AA, UK.
Present address: EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
These authors contributed equally.
Present address: IGC, Instituto Gulbenkian de Ciência, Morada Rua da Quinta Grande, 6, 2780-156 Oeiras, Portugal.
Present address: CRBM, UMR5237 CNRS/Université Montpellier, 1919 route de Mende, F-34293 Montpellier Cedex 5, France.
ISSN:0012-1606
1095-564X
DOI:10.1016/j.ydbio.2015.05.014