GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins

We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a m...

Full description

Saved in:
Bibliographic Details
Published inNAR genomics and bioinformatics Vol. 2; no. 2; p. lqaa026
Main Authors Brůna, Tomáš, Lomsadze, Alexandre, Borodovsky, Mark
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.06.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
ISSN:2631-9268
2631-9268
DOI:10.1093/nargab/lqaa026