Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog

In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes' capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However,...

Full description

Saved in:

Bibliographic Details
Published in	BMC genomics Vol. 23; no. 1; p. 216
Main Authors	Guillaudeux, Nicolas, Belleannée, Catherine, Blanquart, Samuel
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 18.03.2022 BioMed Central BMC
Subjects	Alternative Splicing Alternative transcription Animals Biochemistry, Molecular Biology Bioinformatics Comparative genomics Computer Science Dogs Exons Genes Genetic aspects Genome Genomics Human beings Humans Identification and classification Life Sciences Man Mice Orthology Protein Isoforms - metabolism RNA Splicing Transcript orthology Transcriptome prediction France Transcript orthology Alternative splicing Orthology Transcriptome prediction Gene structure Comparative genomics Alternative transcription Spliced CDS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes' capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure. We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data. We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2164 1471-2164
DOI:	10.1186/s12864-022-08429-4