GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins

We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a m...

Full description

Saved in:
Bibliographic Details
Published inNAR genomics and bioinformatics Vol. 2; no. 2; p. lqaa026
Main Authors Brůna, Tomáš, Lomsadze, Alexandre, Borodovsky, Mark
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.06.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
AbstractList We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient gene finding, GeneMark-ES, with parameters trained in iterative mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
Author Brůna, Tomáš
Borodovsky, Mark
Lomsadze, Alexandre
AuthorAffiliation 3 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
2 Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
1 School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA
AuthorAffiliation_xml – name: 2 Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
– name: 3 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
– name: 1 School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA
Author_xml – sequence: 1
  givenname: Tomáš
  surname: Brůna
  fullname: Brůna, Tomáš
  organization: School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
– sequence: 2
  givenname: Alexandre
  surname: Lomsadze
  fullname: Lomsadze, Alexandre
  organization: Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
– sequence: 3
  givenname: Mark
  surname: Borodovsky
  fullname: Borodovsky, Mark
  email: borodovsky@gatech.edu
  organization: School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32440658$$D View this record in MEDLINE/PubMed
BookMark eNqFkUFv1DAQhS1UREvplSOyxKUIpbUdx0k4IFVVKZWK4ABXrLEz2XWbtVM7acW_x8suVVsJ4YstzfvGb-a9JDs-eCTkNWdHnLXlsYe4AHM83AAwoZ6RPaFKXrRCNTsP3rvkIKUrxpioZCUZf0F2SyElU1WzR36eo8cvEK-Ls2_vP1CcryH-CpOzdJELdIzYOTu54Omdm5Y04dAXUwTnnV9Q5-m0RJpGsEhD_wdJFHyXuTCh8-kVed7DkPBge--TH5_Ovp9-Li6_nl-cnlwWtlS1KhreGFAcUYA0fcnryvaAFbNCthwVl9Z0TW2bukRrGmZN36MRILpWNVJJU-6Tj5u-42xW2Fn02eSgx-hWeR4dwOnHFe-WehFudS3WR-UGh9sGMdzMmCa9csniMIDHMCct8r5KVjPZZunbJ9KrMEefx9PZOWt5qxjPqjcPHd1b-bv6LDjaCGwMKUXs7yWc6XW8ehOv3sabAfkEsG6CdTbrQIZ_Y-82WJjH_33xGzYEu0k
CitedBy_id crossref_primary_10_1016_j_cell_2021_06_006
crossref_primary_10_1093_molbev_msae228
crossref_primary_10_1186_s12864_021_07627_w
crossref_primary_10_1007_s00438_023_02067_5
crossref_primary_10_1038_s41597_024_04097_z
crossref_primary_10_1186_s12864_023_09224_5
crossref_primary_10_1007_s13225_024_00533_y
crossref_primary_10_1093_nargab_lqae072
crossref_primary_10_1128_mra_00380_24
crossref_primary_10_1038_s41597_024_03297_x
crossref_primary_10_3389_fmars_2023_1215125
crossref_primary_10_1093_gbe_evae247
crossref_primary_10_1101_gr_278373_123
crossref_primary_10_1093_g3journal_jkae084
crossref_primary_10_1093_gbe_evae006
crossref_primary_10_1093_bib_bbab004
crossref_primary_10_1186_s12915_021_01134_w
crossref_primary_10_1093_gigascience_giad103
crossref_primary_10_1016_j_xplc_2022_100432
crossref_primary_10_3390_plants13111460
crossref_primary_10_1038_s41477_025_01905_1
crossref_primary_10_2174_0122115501261640231017061158
crossref_primary_10_1038_s41597_024_03209_z
crossref_primary_10_1186_s43008_023_00128_3
crossref_primary_10_1111_1755_0998_13650
crossref_primary_10_1038_s41597_024_03043_3
crossref_primary_10_1038_s41597_024_02929_6
crossref_primary_10_1038_s41597_024_04110_5
crossref_primary_10_1094_MPMI_11_21_0264_A
crossref_primary_10_3390_insects14030304
crossref_primary_10_1126_science_adj4503
crossref_primary_10_1186_s12864_024_10521_w
crossref_primary_10_1093_g3journal_jkaf044
crossref_primary_10_1038_s41597_024_03010_y
crossref_primary_10_1038_s41597_025_04607_7
crossref_primary_10_1128_mra_00224_22
crossref_primary_10_1016_j_gpb_2022_01_002
crossref_primary_10_1038_s41597_023_02903_8
crossref_primary_10_1002_tpg2_20534
crossref_primary_10_1093_aob_mcae085
crossref_primary_10_1016_j_ygeno_2024_110841
crossref_primary_10_1186_s12864_023_09261_0
crossref_primary_10_1038_s41597_024_04035_z
crossref_primary_10_1038_s41564_022_01091_2
crossref_primary_10_1371_journal_pgen_1011512
crossref_primary_10_1094_MPMI_34_11
crossref_primary_10_1016_j_isci_2020_102005
crossref_primary_10_1093_gbe_evac044
crossref_primary_10_1094_MPMI_03_21_0071_R
crossref_primary_10_1093_g3journal_jkab277
crossref_primary_10_1093_g3journal_jkac242
crossref_primary_10_1094_PDIS_09_21_2006_A
crossref_primary_10_12688_f1000research_121706_1
crossref_primary_10_3390_ijms25073634
crossref_primary_10_1093_dnares_dsae005
crossref_primary_10_1093_gigascience_giab016
crossref_primary_10_7554_eLife_79114
crossref_primary_10_1038_s41477_023_01562_2
crossref_primary_10_1093_bioinformatics_btad014
crossref_primary_10_3389_fmars_2021_603410
crossref_primary_10_1093_dnares_dsae001
crossref_primary_10_1016_j_algal_2022_102963
crossref_primary_10_1016_j_isci_2022_104873
crossref_primary_10_3390_jof10090632
crossref_primary_10_1093_g3journal_jkae093
crossref_primary_10_1094_PHYTOFR_12_22_0151_R
crossref_primary_10_1038_s41597_024_02966_1
crossref_primary_10_1093_gbe_evac171
crossref_primary_10_1111_nph_19977
crossref_primary_10_1038_s41559_022_01906_9
crossref_primary_10_1038_s41597_025_04814_2
crossref_primary_10_1093_femsec_fiad151
crossref_primary_10_1016_j_ygeno_2021_11_002
crossref_primary_10_1007_s00253_023_12370_1
crossref_primary_10_3389_fpls_2024_1437132
crossref_primary_10_1038_s41597_023_02811_x
crossref_primary_10_1093_nargab_lqaa108
crossref_primary_10_1038_s41598_024_79768_3
crossref_primary_10_1038_s41597_024_03783_2
crossref_primary_10_17660_ActaHortic_2023_1379_15
crossref_primary_10_1186_s12864_023_09791_7
crossref_primary_10_2478_jofnem_2024_0029
crossref_primary_10_1093_g3journal_jkae282
crossref_primary_10_1038_s41592_024_02298_3
crossref_primary_10_1016_j_xplc_2022_100352
crossref_primary_10_1093_gigascience_giac059
crossref_primary_10_1093_g3journal_jkac300
crossref_primary_10_1038_s41467_024_50622_4
crossref_primary_10_1093_g3journal_jkad079
crossref_primary_10_2197_ipsjtbio_16_20
crossref_primary_10_1038_s41598_024_58096_6
crossref_primary_10_1126_sciadv_ads6459
crossref_primary_10_1094_PDIS_11_21_2549_A
crossref_primary_10_1111_1755_0998_13699
crossref_primary_10_1093_g3journal_jkad185
crossref_primary_10_1186_s12915_023_01635_w
crossref_primary_10_1093_gbe_evac039
crossref_primary_10_1371_journal_pbio_3002661
crossref_primary_10_1186_s12859_021_04482_0
crossref_primary_10_1093_jhered_esac022
crossref_primary_10_1093_gbe_evab063
crossref_primary_10_1038_s41597_022_01910_5
crossref_primary_10_1038_s41597_024_03300_5
crossref_primary_10_1093_g3journal_jkad067
crossref_primary_10_1094_PHYTO_09_21_0389_A
crossref_primary_10_1094_MPMI_10_20_0278_SC
crossref_primary_10_1093_gbe_evad208
crossref_primary_10_7554_eLife_94573_3
crossref_primary_10_1186_s12915_024_01814_3
crossref_primary_10_1101_gr_278566_123
crossref_primary_10_1094_MPMI_35_4
crossref_primary_10_1016_j_dib_2020_106674
crossref_primary_10_1093_g3journal_jkad090
crossref_primary_10_1093_gigascience_giad002
crossref_primary_10_3389_fgene_2022_1020100
crossref_primary_10_1038_s41597_025_04634_4
crossref_primary_10_1093_g3journal_jkab229
crossref_primary_10_1093_dnares_dsac027
crossref_primary_10_3390_ijms23158503
crossref_primary_10_1093_gbe_evac136
crossref_primary_10_3389_fpls_2023_1180982
crossref_primary_10_1038_s41597_024_03818_8
crossref_primary_10_1093_gbe_evac014
crossref_primary_10_1093_g3journal_jkaf020
crossref_primary_10_1016_j_cub_2023_09_052
crossref_primary_10_1093_gbe_evac133
crossref_primary_10_1186_s12864_024_11025_3
crossref_primary_10_1093_gigascience_giad116
crossref_primary_10_1038_s41467_024_48235_y
crossref_primary_10_1371_journal_ppat_1010869
crossref_primary_10_1038_s41597_023_02561_w
crossref_primary_10_1016_j_ijpara_2023_08_004
crossref_primary_10_1128_mra_00101_23
crossref_primary_10_3389_fevo_2025_1459690
crossref_primary_10_1111_mec_16608
crossref_primary_10_2478_jofnem_2022_0059
crossref_primary_10_1186_s12864_023_09160_4
crossref_primary_10_3389_fpls_2025_1528404
crossref_primary_10_1093_icb_icad087
crossref_primary_10_1007_s10126_024_10325_9
crossref_primary_10_1371_journal_pone_0276287
crossref_primary_10_1094_PHYTO_08_20_0376_SC
crossref_primary_10_1016_j_xplc_2024_101000
crossref_primary_10_1111_jfb_15844
crossref_primary_10_1038_s42003_024_06663_y
crossref_primary_10_1094_MPMI_09_22_0185_A
crossref_primary_10_1038_s41598_023_27881_0
crossref_primary_10_3389_fgene_2022_988488
crossref_primary_10_1038_s41467_022_32924_7
crossref_primary_10_1093_g3journal_jkac065
crossref_primary_10_3390_plants12183246
crossref_primary_10_1093_bib_bbad381
crossref_primary_10_1111_cla_12521
crossref_primary_10_1093_molbev_msaf027
crossref_primary_10_1038_s41467_024_48595_5
crossref_primary_10_7717_peerj_16276
crossref_primary_10_1038_s41597_025_04423_z
crossref_primary_10_1111_1755_0998_14103
crossref_primary_10_1093_gigascience_giac090
crossref_primary_10_1038_s41597_025_04469_z
crossref_primary_10_1186_s12864_023_09172_0
crossref_primary_10_1038_s42003_023_05129_x
crossref_primary_10_1038_s41597_024_03276_2
crossref_primary_10_1038_s41588_023_01589_3
crossref_primary_10_1128_spectrum_04770_22
crossref_primary_10_3390_jof10110746
crossref_primary_10_1093_g3journal_jkae237
crossref_primary_10_1093_g3journal_jkad146
crossref_primary_10_1093_g3journal_jkae115
crossref_primary_10_1093_gigascience_giae029
crossref_primary_10_1128_mbio_01423_24
crossref_primary_10_1186_s12859_022_04973_8
crossref_primary_10_1093_g3journal_jkab085
crossref_primary_10_1093_molbev_msae182
crossref_primary_10_1093_molbev_msaf030
crossref_primary_10_1094_PHYTOFR_12_22_0144_A
crossref_primary_10_1128_mbio_01676_23
crossref_primary_10_3389_fpls_2024_1413468
crossref_primary_10_1186_s12915_023_01682_3
crossref_primary_10_1093_g3journal_jkae021
crossref_primary_10_1093_g3journal_jkad292
crossref_primary_10_1093_gbe_evab236
crossref_primary_10_1093_gbe_evab114
crossref_primary_10_1094_PDIS_04_22_0917_A
crossref_primary_10_1017_qpb_2021_18
crossref_primary_10_1038_s41588_024_02071_4
crossref_primary_10_1038_s41586_023_05936_6
crossref_primary_10_1038_s41467_022_34202_y
crossref_primary_10_1038_s41597_024_03906_9
crossref_primary_10_1073_pnas_2319679121
crossref_primary_10_1093_gbe_evad093
crossref_primary_10_3389_fgene_2021_747552
crossref_primary_10_7554_eLife_94573
crossref_primary_10_3390_biom15010097
crossref_primary_10_1093_g3journal_jkad281
crossref_primary_10_1093_g3journal_jkad282
crossref_primary_10_1093_gigascience_giae124
crossref_primary_10_1093_hr_uhac247
crossref_primary_10_1007_s44297_023_00005_w
crossref_primary_10_1093_g3journal_jkae134
crossref_primary_10_1038_s41586_022_04808_9
crossref_primary_10_1111_1755_0998_14010
crossref_primary_10_3389_fpls_2024_1434388
crossref_primary_10_1094_MPMI_04_22_0096_A
crossref_primary_10_1128_mra_00913_23
crossref_primary_10_1371_journal_pgen_1011615
crossref_primary_10_1093_dnares_dsae012
crossref_primary_10_1101_gr_278090_123
crossref_primary_10_1073_pnas_2211117120
crossref_primary_10_1038_s42003_024_06550_6
crossref_primary_10_1242_bio_059237
crossref_primary_10_3389_fgene_2021_735690
crossref_primary_10_1038_s42003_024_07124_2
crossref_primary_10_1093_aob_mcae179
crossref_primary_10_1093_nar_gkad685
crossref_primary_10_1016_j_heliyon_2024_e38687
crossref_primary_10_1039_D2MO00150K
crossref_primary_10_1038_s41598_021_89091_w
crossref_primary_10_1111_pbi_14446
crossref_primary_10_1093_g3journal_jkac017
crossref_primary_10_24072_pcjournal_381
crossref_primary_10_3389_fgene_2022_884081
crossref_primary_10_3390_mps5020026
crossref_primary_10_1093_gbe_evaf027
crossref_primary_10_12688_f1000research_148511_1
crossref_primary_10_1038_s41597_024_03232_0
crossref_primary_10_1038_s41597_025_04473_3
crossref_primary_10_1093_jhered_esae049
crossref_primary_10_1016_j_biortech_2023_130206
crossref_primary_10_1371_journal_pgen_1011165
crossref_primary_10_1002_ece3_10389
crossref_primary_10_3390_pathogens12010066
crossref_primary_10_1038_s41438_021_00641_9
crossref_primary_10_1093_g3journal_jkad019
crossref_primary_10_1093_nar_gkad898
crossref_primary_10_21105_joss_04851
crossref_primary_10_3390_jof8101088
crossref_primary_10_46471_gigabyte_44
crossref_primary_10_1093_gbe_evae148
crossref_primary_10_1093_jhered_esad064
crossref_primary_10_1094_PHYTOFR_05_23_0065_A
crossref_primary_10_1093_gbe_evae268
crossref_primary_10_3389_fpls_2023_1284478
crossref_primary_10_1093_g3journal_jkac289
crossref_primary_10_1111_imb_12818
crossref_primary_10_1038_s41597_024_03500_z
crossref_primary_10_1093_g3journal_jkac164
crossref_primary_10_1186_s12864_024_10810_4
crossref_primary_10_1038_s41597_024_03837_5
crossref_primary_10_1038_s41597_024_03046_0
crossref_primary_10_3389_fgene_2023_1244493
crossref_primary_10_1007_s13592_024_01140_1
crossref_primary_10_1093_nar_gkab1090
crossref_primary_10_3390_ijms24010889
crossref_primary_10_1093_gigascience_giad075
crossref_primary_10_1093_jhered_esae021
crossref_primary_10_46471_gigabyte_51
crossref_primary_10_1094_MPMI_07_22_0154_R
crossref_primary_10_1093_g3journal_jkad126
crossref_primary_10_1186_s12915_023_01639_6
crossref_primary_10_3390_biology12071001
crossref_primary_10_1016_j_crbiot_2024_100186
crossref_primary_10_1093_g3journal_jkac276
crossref_primary_10_1093_g3journal_jkae211
Cites_doi 10.1093/bioinformatics/btg1080
10.1093/nar/gkx997
10.1093/bioinformatics/btn460
10.1186/1471-2105-10-421
10.1007/978-1-4939-9173-0_6
10.1093/nar/gky1053
10.1186/s12859-019-3182-x
10.1093/nar/gkw092
10.1186/gb-2008-9-1-r7
10.1093/bioinformatics/btr010
10.1101/gr.1865504
10.1093/database/baw093
10.1093/bioinformatics/btm071
10.1186/1471-2105-15-189
10.1006/jmbi.1997.0951
10.1002/cpbi.57
10.1101/gr.081612.108
10.1093/nar/gkq1019
10.1093/nar/26.4.1107
10.2174/157489308784340702
10.1186/1471-2105-11-S10-O8
10.1038/nmeth.3176
10.1093/nar/gki937
10.1093/nar/gkw1129
10.1093/nar/gku557
10.1093/bioinformatics/btv661
10.1016/0097-8485(93)85004-V
10.1016/j.infsof.2005.09.005
10.1101/gr.10.4.511
ContentType Journal Article
Copyright The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2020
The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2020
– notice: The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
– notice: The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID TOX
AAYXX
CITATION
NPM
8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
7X8
5PM
DOI 10.1093/nargab/lqaa026
DatabaseName Oxford Journals Open Access Collection
CrossRef
PubMed
ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Biological Science Collection
Biological Science Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
PubMed
Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
MEDLINE - Academic
DatabaseTitleList CrossRef
Publicly Available Content Database
PubMed


MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2631-9268
ExternalDocumentID PMC7222226
32440658
10_1093_nargab_lqaa026
10.1093/nargab/lqaa026
Genre Journal Article
GrantInformation_xml – fundername: ;
  grantid: GM128145
GroupedDBID 0R~
53G
AAFWJ
AAPXW
AAVAP
ABEJV
ABGNP
ABPTD
ABXVV
AFPKN
AFULF
ALMA_UNASSIGNED_HOLDINGS
AMNDL
EBS
EMOBN
GROUPED_DOAJ
IAO
IGS
IHR
INH
ITC
KSI
M~E
ROX
RPM
TOX
AAYXX
AFKRA
BBNVY
BENPR
BHPHI
CCPQU
CITATION
HCIFZ
M7P
PHGZM
PHGZT
PIMPY
NPM
8FE
8FH
ABUWG
AZQEC
DWQXO
GNUQQ
LK8
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
7X8
5PM
ID FETCH-LOGICAL-c3676-818ba61ee2a4bf3175cfae50c2491e614cbd87c873ecb80cbffeb2a2d968464b3
IEDL.DBID BENPR
ISSN 2631-9268
IngestDate Thu Aug 21 14:13:35 EDT 2025
Fri Jul 11 08:39:38 EDT 2025
Fri Jul 25 11:54:44 EDT 2025
Wed Feb 19 02:29:50 EST 2025
Tue Jul 01 02:50:14 EDT 2025
Thu Apr 24 23:03:37 EDT 2025
Thu Jan 30 13:18:23 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
https://creativecommons.org/licenses/by-nc/4.0
The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3676-818ba61ee2a4bf3175cfae50c2491e614cbd87c873ecb80cbffeb2a2d968464b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
OpenAccessLink https://www.proquest.com/docview/3170919601?pq-origsite=%requestingapplication%
PMID 32440658
PQID 3170919601
PQPubID 7097362
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_7222226
proquest_miscellaneous_2406307049
proquest_journals_3170919601
pubmed_primary_32440658
crossref_primary_10_1093_nargab_lqaa026
crossref_citationtrail_10_1093_nargab_lqaa026
oup_primary_10_1093_nargab_lqaa026
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-06-01
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-06-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle NAR genomics and bioinformatics
PublicationTitleAlternate NAR Genom Bioinform
PublicationYear 2020
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Hoff (2024030510173379400_B31) 2016; 32
Rodriguez (2024030510173379400_B24) 2018; 46
Borodovsky (2024030510173379400_B26) 1993; 17
Keller (2024030510173379400_B10) 2011; 27
Keilwagen (2024030510173379400_B11) 2016; 44
Gotoh (2024030510173379400_B12) 2014; 15
Foissac (2024030510173379400_B3) 2008; 3
Kiryutin (2024030510173379400_B8) 2007
Ter-Hovhannisyan (2024030510173379400_B21) 2008; 18
Lomsadze (2024030510173379400_B2) 2014; 42
Marchler-Bauer (2024030510173379400_B28) 2017; 45
Sallet (2024030510173379400_B4) 2019; 1962
Gremme (2024030510173379400_B7) 2005; 47
Buchfink (2024030510173379400_B25) 2015; 12
Stanke (2024030510173379400_B29) 2019; 20
Parra (2024030510173379400_B16) 2000; 10
Burge (2024030510173379400_B13) 1997; 268
Kriventseva (2024030510173379400_B23) 2019; 47
Behr (2024030510173379400_B5) 2010; 11
Haas (2024030510173379400_B18) 2008; 9
Camacho (2024030510173379400_B27) 2009; 10
Gotoh (2024030510173379400_B9) 2008; 24
Leinonen (2024030510173379400_B30) 2011; 39
Stanke (2024030510173379400_B15) 2003; 19
Aken (2024030510173379400_B19) 2016; 2016
Birney (2024030510173379400_B6) 2004; 14
Hoff (2024030510173379400_B1) 2019; 65
Souvorov (2024030510173379400_B17) 2010
Lukashin (2024030510173379400_B14) 1998; 26
Parra (2024030510173379400_B22) 2007; 23
Lomsadze (2024030510173379400_B20) 2005; 33
References_xml – volume: 19
  start-page: ii215
  issue: Suppl. 2
  year: 2003
  ident: 2024030510173379400_B15
  article-title: Gene prediction with a hidden Markov model and a new intron submodel
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btg1080
– volume: 46
  start-page: D213
  year: 2018
  ident: 2024030510173379400_B24
  article-title: APPRIS 2017: principal isoforms for multiple gene sets
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkx997
– volume: 24
  start-page: 2438
  year: 2008
  ident: 2024030510173379400_B9
  article-title: Direct mapping and alignment of protein sequences onto genomic sequence
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn460
– volume: 10
  start-page: 421
  year: 2009
  ident: 2024030510173379400_B27
  article-title: BLAST+: architecture and applications
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-10-421
– volume: 1962
  start-page: 97
  year: 2019
  ident: 2024030510173379400_B4
  article-title: EuGene: an automated integrative gene finder for eukaryotes and prokaryotes
  publication-title: Methods Mol. Biol.
  doi: 10.1007/978-1-4939-9173-0_6
– volume: 47
  start-page: D807
  year: 2019
  ident: 2024030510173379400_B23
  article-title: OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gky1053
– volume: 20
  start-page: 558
  year: 2019
  ident: 2024030510173379400_B29
  article-title: VARUS: sampling complementary RNA reads from the Sequence Read Archive
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-019-3182-x
– volume: 44
  start-page: e89
  year: 2016
  ident: 2024030510173379400_B11
  article-title: Using intron position conservation for homology-based gene prediction
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkw092
– volume: 9
  start-page: R7
  year: 2008
  ident: 2024030510173379400_B18
  article-title: Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments
  publication-title: Genome Biol.
  doi: 10.1186/gb-2008-9-1-r7
– volume: 27
  start-page: 757
  year: 2011
  ident: 2024030510173379400_B10
  article-title: A novel hybrid gene prediction method employing protein multiple sequence alignments
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr010
– volume: 14
  start-page: 988
  year: 2004
  ident: 2024030510173379400_B6
  article-title: GeneWise and Genomewise
  publication-title: Genome Res.
  doi: 10.1101/gr.1865504
– volume: 2016
  start-page: baw093
  year: 2016
  ident: 2024030510173379400_B19
  article-title: The Ensembl gene annotation system
  publication-title: Database
  doi: 10.1093/database/baw093
– volume: 23
  start-page: 1061
  year: 2007
  ident: 2024030510173379400_B22
  article-title: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm071
– volume: 15
  start-page: 189
  year: 2014
  ident: 2024030510173379400_B12
  article-title: Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-15-189
– volume: 268
  start-page: 78
  year: 1997
  ident: 2024030510173379400_B13
  article-title: Prediction of complete gene structures in human genomic DNA
  publication-title: J. Mol. Biol.
  doi: 10.1006/jmbi.1997.0951
– volume: 65
  start-page: e57
  year: 2019
  ident: 2024030510173379400_B1
  article-title: Predicting genes in single genomes with AUGUSTUS
  publication-title: Curr. Protoc. Bioinformatics
  doi: 10.1002/cpbi.57
– volume: 18
  start-page: 1979
  year: 2008
  ident: 2024030510173379400_B21
  article-title: Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training
  publication-title: Genome Res.
  doi: 10.1101/gr.081612.108
– volume: 39
  start-page: D19
  year: 2011
  ident: 2024030510173379400_B30
  article-title: The Sequence Read Archive
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkq1019
– volume: 26
  start-page: 1107
  year: 1998
  ident: 2024030510173379400_B14
  article-title: GeneMark.hmm: new solutions for gene finding
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/26.4.1107
– volume: 3
  start-page: 87
  year: 2008
  ident: 2024030510173379400_B3
  article-title: Genome annotation in plants and fungi: EuGene as a model platform
  publication-title: Curr. Bioinformatics
  doi: 10.2174/157489308784340702
– volume: 11
  start-page: O8
  year: 2010
  ident: 2024030510173379400_B5
  article-title: Next generation genome annotation with mGene.ngs
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-11-S10-O8
– volume-title: 11th Annual International Conference in Research in Computational Molecular Biology
  year: 2007
  ident: 2024030510173379400_B8
– volume: 12
  start-page: 59
  year: 2015
  ident: 2024030510173379400_B25
  article-title: Fast and sensitive protein alignment using DIAMOND
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.3176
– volume: 33
  start-page: 6494
  year: 2005
  ident: 2024030510173379400_B20
  article-title: Gene identification in novel eukaryotic genomes by self-training algorithm
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gki937
– volume: 45
  start-page: D200
  year: 2017
  ident: 2024030510173379400_B28
  article-title: CDD/SPARCLE: functional classification of proteins via subfamily domain architectures
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkw1129
– volume: 42
  start-page: e119
  year: 2014
  ident: 2024030510173379400_B2
  article-title: Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gku557
– volume: 32
  start-page: 767
  year: 2016
  ident: 2024030510173379400_B31
  article-title: BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv661
– volume: 17
  start-page: 123
  year: 1993
  ident: 2024030510173379400_B26
  article-title: GenMark: parallel gene recognition for both DNA strands
  publication-title: Comput. Chem.
  doi: 10.1016/0097-8485(93)85004-V
– volume: 47
  start-page: 965
  year: 2005
  ident: 2024030510173379400_B7
  article-title: Engineering a software tool for gene structure prediction in higher organisms
  publication-title: Inform. Software Technol.
  doi: 10.1016/j.infsof.2005.09.005
– volume-title: National Center for Biotechnology Information
  year: 2010
  ident: 2024030510173379400_B17
  article-title: Gnomon:NCBI eukaryotic gene prediction tool
– volume: 10
  start-page: 511
  year: 2000
  ident: 2024030510173379400_B16
  article-title: GeneID in Drosophila
  publication-title: Genome Res.
  doi: 10.1101/gr.10.4.511
SSID ssj0002545401
Score 2.5957918
Snippet We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage lqaa026
SubjectTerms Accuracy
Gene mapping
Genes
Genomes
Methart
Peptide mapping
Predictions
Title GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins
URI https://www.ncbi.nlm.nih.gov/pubmed/32440658
https://www.proquest.com/docview/3170919601
https://www.proquest.com/docview/2406307049
https://pubmed.ncbi.nlm.nih.gov/PMC7222226
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1La9tAEF4a55JLSGiTOg-zDYEcymLruXIupS02IZA0FBt8iphdrYiJWdlWfMi_z4y0Vu1AWx100QhpZ1Yz38xqv2HsEqSKfBX3hQIIRRgYX4CWuSAurTjzPIgi2uB8dx_fjMPbSTRxBbfS_Va59omVo84KTTXyLsY5DG2It71v84WgrlG0uupaaOywXXTBSdJiuz8G9w-_myoLpj8ISbyGrTHoWuofq7qzBUDFqLARjbZ2uG0Azff_S24EoOEB23fIkX-vTX3IPhj7kT0SbTTttxGDh6_X3KyeYflaoATHiWH4fEnrMKR7TgVXXppZLtZdIfjUcoR_HH2KNrzIq1tKDjbjFXvD1Jaf2Hg4GP28Ea5lgtBEvSYw_CqIPWN8CFVO2EDnYKKexizLMxiKtcoSqRMZGK2SnlZ5jqk1-Fk_RiASquCItWxhzWfGJSoUsjjyEo1ZhDQQKwQLUgIaFDFgv83EWnWpdnziNIBZWq9rB2mt6tSpus2uGvl5zaTxV8kLtMR_hc7WhkrdZ1emfyZJm31pLuMHQ6sgYE2xKlOCMOToQhzBcW3X5lGILkPCZG0mtyzeCBAZ9_YVO32qSLmlT0d88u_XOmV7PiXsVRnnjLVelitzjqjmRXXc1O1UVYFOVXbC8-jX5A2Tqf9l
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fb9MwED6N7gFeEIhfhcEMAvGArDZOYrdIE2LQqWNbNaFN2hOZ7TiionK6ZhXaP8XfuLv8okUCntbXXNrGd_b33Tn-DuC1ViYWRg650TriUegE11ZlnLS0ZBoEOo7pgPPRRI5Poy9n8dkG_GrOwtBrlc2aWC7UaW6pRt5DnENoQ74dfJhfcOoaRburTQuNKiwO3NVPTNmKnf3P6N83QuyNTj6Ned1VgFtSJ-OIUEbLwDmhI5MRfNpMu7hvMREJHKKVNelA2YEKnTWDvjVZhtmnFulQIlZHJsTvvQWbUSj7ogObu6PJ8de2qoPpFlKgoFWHDHue-tWa3uxC61LBYQX91k7UrRDbP9_PXAG8vXtwt2aq7GMVWvdhw_kH8I1kqul8Dx8dv3vP3PKHXlzlaMEwEB2bL2jfh3zNqMDLCjfLeNOFgk09Q7rJcA2zjuVZeUvBtE9ZqRYx9cVDOL2RwXwEHZ979wSYQqTUqYyDgcWsRTktDZITpTQGEHLOYRd4M3SJrfXL6QFmSbWPHibVUCf1UHfhbWs_r5Q7_mr5Cj3xX6OtxlFJPc2L5HdQduFlexknKO26aO_yZZEQZaKFNcIneFz5tf0pZLMRccAuqDWPtwYk_r1-xU-_lyLgStBHPv3339qG2-OTo8PkcH9y8AzuCCoWlCWkLehcLpbuOTKqS_OiDmMG5zc9c64Bpdg6XA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GeneMark-EP%2B%3A+eukaryotic+gene+prediction+with+self-training+in+the+space+of+genes+and+proteins&rft.jtitle=NAR+genomics+and+bioinformatics&rft.au=Br%C5%AFna%2C+Tom%C3%A1%C5%A1&rft.au=Lomsadze%2C+Alexandre&rft.au=Borodovsky%2C+Mark&rft.date=2020-06-01&rft.eissn=2631-9268&rft.volume=2&rft.issue=2&rft.spage=lqaa026&rft_id=info:doi/10.1093%2Fnargab%2Flqaa026&rft_id=info%3Apmid%2F32440658&rft.externalDocID=32440658
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2631-9268&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2631-9268&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2631-9268&client=summon