GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins

We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a m...

Full description

Saved in:

Bibliographic Details
Published in	NAR genomics and bioinformatics Vol. 2; no. 2; p. lqaa026
Main Authors	Brůna, Tomáš, Lomsadze, Alexandre, Borodovsky, Mark
Format	Journal Article
Language	English
Published	England Oxford University Press 01.06.2020
Subjects	Accuracy Gene mapping Genes Genomes Methart Peptide mapping Predictions
Online Access	Get full text

Cover

Loading…

Abstract	We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
AbstractList	We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes. We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient gene finding, GeneMark-ES, with parameters trained in iterative mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes. We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes. We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.
Author	Brůna, Tomáš Borodovsky, Mark Lomsadze, Alexandre
AuthorAffiliation	3 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA 2 Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA 1 School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA
AuthorAffiliation_xml	– name: 2 Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA – name: 3 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA – name: 1 School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA
Author_xml	– sequence: 1 givenname: Tomáš surname: Brůna fullname: Brůna, Tomáš organization: School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA – sequence: 2 givenname: Alexandre surname: Lomsadze fullname: Lomsadze, Alexandre organization: Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA – sequence: 3 givenname: Mark surname: Borodovsky fullname: Borodovsky, Mark email: borodovsky@gatech.edu organization: School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/32440658$$D View this record in MEDLINE/PubMed
BookMark	eNqFkUFv1DAQhS1UREvplSOyxKUIpbUdx0k4IFVVKZWK4ABXrLEz2XWbtVM7acW_x8suVVsJ4YstzfvGb-a9JDs-eCTkNWdHnLXlsYe4AHM83AAwoZ6RPaFKXrRCNTsP3rvkIKUrxpioZCUZf0F2SyElU1WzR36eo8cvEK-Ls2_vP1CcryH-CpOzdJELdIzYOTu54Omdm5Y04dAXUwTnnV9Q5-m0RJpGsEhD_wdJFHyXuTCh8-kVed7DkPBge--TH5_Ovp9-Li6_nl-cnlwWtlS1KhreGFAcUYA0fcnryvaAFbNCthwVl9Z0TW2bukRrGmZN36MRILpWNVJJU-6Tj5u-42xW2Fn02eSgx-hWeR4dwOnHFe-WehFudS3WR-UGh9sGMdzMmCa9csniMIDHMCct8r5KVjPZZunbJ9KrMEefx9PZOWt5qxjPqjcPHd1b-bv6LDjaCGwMKUXs7yWc6XW8ehOv3sabAfkEsG6CdTbrQIZ_Y-82WJjH_33xGzYEu0k
CitedBy_id	crossref_primary_10_1016_j_cell_2021_06_006 crossref_primary_10_1093_molbev_msae228 crossref_primary_10_1186_s12864_021_07627_w crossref_primary_10_1007_s00438_023_02067_5 crossref_primary_10_1038_s41597_024_04097_z crossref_primary_10_1186_s12864_023_09224_5 crossref_primary_10_1007_s13225_024_00533_y crossref_primary_10_1093_nargab_lqae072 crossref_primary_10_1128_mra_00380_24 crossref_primary_10_1038_s41597_024_03297_x crossref_primary_10_3389_fmars_2023_1215125 crossref_primary_10_1093_gbe_evae247 crossref_primary_10_1101_gr_278373_123 crossref_primary_10_1093_g3journal_jkae084 crossref_primary_10_1093_gbe_evae006 crossref_primary_10_1093_bib_bbab004 crossref_primary_10_1186_s12915_021_01134_w crossref_primary_10_1093_gigascience_giad103 crossref_primary_10_1016_j_xplc_2022_100432 crossref_primary_10_3390_plants13111460 crossref_primary_10_1038_s41477_025_01905_1 crossref_primary_10_2174_0122115501261640231017061158 crossref_primary_10_1038_s41597_024_03209_z crossref_primary_10_1186_s43008_023_00128_3 crossref_primary_10_1111_1755_0998_13650 crossref_primary_10_1038_s41597_024_03043_3 crossref_primary_10_1038_s41597_024_02929_6 crossref_primary_10_1038_s41597_024_04110_5 crossref_primary_10_1094_MPMI_11_21_0264_A crossref_primary_10_3390_insects14030304 crossref_primary_10_1126_science_adj4503 crossref_primary_10_1186_s12864_024_10521_w crossref_primary_10_1093_g3journal_jkaf044 crossref_primary_10_1038_s41597_024_03010_y crossref_primary_10_1038_s41597_025_04607_7 crossref_primary_10_1128_mra_00224_22 crossref_primary_10_1016_j_gpb_2022_01_002 crossref_primary_10_1038_s41597_023_02903_8 crossref_primary_10_1002_tpg2_20534 crossref_primary_10_1093_aob_mcae085 crossref_primary_10_1016_j_ygeno_2024_110841 crossref_primary_10_1186_s12864_023_09261_0 crossref_primary_10_1038_s41597_024_04035_z crossref_primary_10_1038_s41564_022_01091_2 crossref_primary_10_1371_journal_pgen_1011512 crossref_primary_10_1094_MPMI_34_11 crossref_primary_10_1016_j_isci_2020_102005 crossref_primary_10_1093_gbe_evac044 crossref_primary_10_1094_MPMI_03_21_0071_R crossref_primary_10_1093_g3journal_jkab277 crossref_primary_10_1093_g3journal_jkac242 crossref_primary_10_1094_PDIS_09_21_2006_A crossref_primary_10_12688_f1000research_121706_1 crossref_primary_10_3390_ijms25073634 crossref_primary_10_1093_dnares_dsae005 crossref_primary_10_1093_gigascience_giab016 crossref_primary_10_7554_eLife_79114 crossref_primary_10_1038_s41477_023_01562_2 crossref_primary_10_1093_bioinformatics_btad014 crossref_primary_10_3389_fmars_2021_603410 crossref_primary_10_1093_dnares_dsae001 crossref_primary_10_1016_j_algal_2022_102963 crossref_primary_10_1016_j_isci_2022_104873 crossref_primary_10_3390_jof10090632 crossref_primary_10_1093_g3journal_jkae093 crossref_primary_10_1094_PHYTOFR_12_22_0151_R crossref_primary_10_1038_s41597_024_02966_1 crossref_primary_10_1093_gbe_evac171 crossref_primary_10_1111_nph_19977 crossref_primary_10_1038_s41559_022_01906_9 crossref_primary_10_1038_s41597_025_04814_2 crossref_primary_10_1093_femsec_fiad151 crossref_primary_10_1016_j_ygeno_2021_11_002 crossref_primary_10_1007_s00253_023_12370_1 crossref_primary_10_3389_fpls_2024_1437132 crossref_primary_10_1038_s41597_023_02811_x crossref_primary_10_1093_nargab_lqaa108 crossref_primary_10_1038_s41598_024_79768_3 crossref_primary_10_1038_s41597_024_03783_2 crossref_primary_10_17660_ActaHortic_2023_1379_15 crossref_primary_10_1186_s12864_023_09791_7 crossref_primary_10_2478_jofnem_2024_0029 crossref_primary_10_1093_g3journal_jkae282 crossref_primary_10_1038_s41592_024_02298_3 crossref_primary_10_1016_j_xplc_2022_100352 crossref_primary_10_1093_gigascience_giac059 crossref_primary_10_1093_g3journal_jkac300 crossref_primary_10_1038_s41467_024_50622_4 crossref_primary_10_1093_g3journal_jkad079 crossref_primary_10_2197_ipsjtbio_16_20 crossref_primary_10_1038_s41598_024_58096_6 crossref_primary_10_1126_sciadv_ads6459 crossref_primary_10_1094_PDIS_11_21_2549_A crossref_primary_10_1111_1755_0998_13699 crossref_primary_10_1093_g3journal_jkad185 crossref_primary_10_1186_s12915_023_01635_w crossref_primary_10_1093_gbe_evac039 crossref_primary_10_1371_journal_pbio_3002661 crossref_primary_10_1186_s12859_021_04482_0 crossref_primary_10_1093_jhered_esac022 crossref_primary_10_1093_gbe_evab063 crossref_primary_10_1038_s41597_022_01910_5 crossref_primary_10_1038_s41597_024_03300_5 crossref_primary_10_1093_g3journal_jkad067 crossref_primary_10_1094_PHYTO_09_21_0389_A crossref_primary_10_1094_MPMI_10_20_0278_SC crossref_primary_10_1093_gbe_evad208 crossref_primary_10_7554_eLife_94573_3 crossref_primary_10_1186_s12915_024_01814_3 crossref_primary_10_1101_gr_278566_123 crossref_primary_10_1094_MPMI_35_4 crossref_primary_10_1016_j_dib_2020_106674 crossref_primary_10_1093_g3journal_jkad090 crossref_primary_10_1093_gigascience_giad002 crossref_primary_10_3389_fgene_2022_1020100 crossref_primary_10_1038_s41597_025_04634_4 crossref_primary_10_1093_g3journal_jkab229 crossref_primary_10_1093_dnares_dsac027 crossref_primary_10_3390_ijms23158503 crossref_primary_10_1093_gbe_evac136 crossref_primary_10_3389_fpls_2023_1180982 crossref_primary_10_1038_s41597_024_03818_8 crossref_primary_10_1093_gbe_evac014 crossref_primary_10_1093_g3journal_jkaf020 crossref_primary_10_1016_j_cub_2023_09_052 crossref_primary_10_1093_gbe_evac133 crossref_primary_10_1186_s12864_024_11025_3 crossref_primary_10_1093_gigascience_giad116 crossref_primary_10_1038_s41467_024_48235_y crossref_primary_10_1371_journal_ppat_1010869 crossref_primary_10_1038_s41597_023_02561_w crossref_primary_10_1016_j_ijpara_2023_08_004 crossref_primary_10_1128_mra_00101_23 crossref_primary_10_3389_fevo_2025_1459690 crossref_primary_10_1111_mec_16608 crossref_primary_10_2478_jofnem_2022_0059 crossref_primary_10_1186_s12864_023_09160_4 crossref_primary_10_3389_fpls_2025_1528404 crossref_primary_10_1093_icb_icad087 crossref_primary_10_1007_s10126_024_10325_9 crossref_primary_10_1371_journal_pone_0276287 crossref_primary_10_1094_PHYTO_08_20_0376_SC crossref_primary_10_1016_j_xplc_2024_101000 crossref_primary_10_1111_jfb_15844 crossref_primary_10_1038_s42003_024_06663_y crossref_primary_10_1094_MPMI_09_22_0185_A crossref_primary_10_1038_s41598_023_27881_0 crossref_primary_10_3389_fgene_2022_988488 crossref_primary_10_1038_s41467_022_32924_7 crossref_primary_10_1093_g3journal_jkac065 crossref_primary_10_3390_plants12183246 crossref_primary_10_1093_bib_bbad381 crossref_primary_10_1111_cla_12521 crossref_primary_10_1093_molbev_msaf027 crossref_primary_10_1038_s41467_024_48595_5 crossref_primary_10_7717_peerj_16276 crossref_primary_10_1038_s41597_025_04423_z crossref_primary_10_1111_1755_0998_14103 crossref_primary_10_1093_gigascience_giac090 crossref_primary_10_1038_s41597_025_04469_z crossref_primary_10_1186_s12864_023_09172_0 crossref_primary_10_1038_s42003_023_05129_x crossref_primary_10_1038_s41597_024_03276_2 crossref_primary_10_1038_s41588_023_01589_3 crossref_primary_10_1128_spectrum_04770_22 crossref_primary_10_3390_jof10110746 crossref_primary_10_1093_g3journal_jkae237 crossref_primary_10_1093_g3journal_jkad146 crossref_primary_10_1093_g3journal_jkae115 crossref_primary_10_1093_gigascience_giae029 crossref_primary_10_1128_mbio_01423_24 crossref_primary_10_1186_s12859_022_04973_8 crossref_primary_10_1093_g3journal_jkab085 crossref_primary_10_1093_molbev_msae182 crossref_primary_10_1093_molbev_msaf030 crossref_primary_10_1094_PHYTOFR_12_22_0144_A crossref_primary_10_1128_mbio_01676_23 crossref_primary_10_3389_fpls_2024_1413468 crossref_primary_10_1186_s12915_023_01682_3 crossref_primary_10_1093_g3journal_jkae021 crossref_primary_10_1093_g3journal_jkad292 crossref_primary_10_1093_gbe_evab236 crossref_primary_10_1093_gbe_evab114 crossref_primary_10_1094_PDIS_04_22_0917_A crossref_primary_10_1017_qpb_2021_18 crossref_primary_10_1038_s41588_024_02071_4 crossref_primary_10_1038_s41586_023_05936_6 crossref_primary_10_1038_s41467_022_34202_y crossref_primary_10_1038_s41597_024_03906_9 crossref_primary_10_1073_pnas_2319679121 crossref_primary_10_1093_gbe_evad093 crossref_primary_10_3389_fgene_2021_747552 crossref_primary_10_7554_eLife_94573 crossref_primary_10_3390_biom15010097 crossref_primary_10_1093_g3journal_jkad281 crossref_primary_10_1093_g3journal_jkad282 crossref_primary_10_1093_gigascience_giae124 crossref_primary_10_1093_hr_uhac247 crossref_primary_10_1007_s44297_023_00005_w crossref_primary_10_1093_g3journal_jkae134 crossref_primary_10_1038_s41586_022_04808_9 crossref_primary_10_1111_1755_0998_14010 crossref_primary_10_3389_fpls_2024_1434388 crossref_primary_10_1094_MPMI_04_22_0096_A crossref_primary_10_1128_mra_00913_23 crossref_primary_10_1371_journal_pgen_1011615 crossref_primary_10_1093_dnares_dsae012 crossref_primary_10_1101_gr_278090_123 crossref_primary_10_1073_pnas_2211117120 crossref_primary_10_1038_s42003_024_06550_6 crossref_primary_10_1242_bio_059237 crossref_primary_10_3389_fgene_2021_735690 crossref_primary_10_1038_s42003_024_07124_2 crossref_primary_10_1093_aob_mcae179 crossref_primary_10_1093_nar_gkad685 crossref_primary_10_1016_j_heliyon_2024_e38687 crossref_primary_10_1039_D2MO00150K crossref_primary_10_1038_s41598_021_89091_w crossref_primary_10_1111_pbi_14446 crossref_primary_10_1093_g3journal_jkac017 crossref_primary_10_24072_pcjournal_381 crossref_primary_10_3389_fgene_2022_884081 crossref_primary_10_3390_mps5020026 crossref_primary_10_1093_gbe_evaf027 crossref_primary_10_12688_f1000research_148511_1 crossref_primary_10_1038_s41597_024_03232_0 crossref_primary_10_1038_s41597_025_04473_3 crossref_primary_10_1093_jhered_esae049 crossref_primary_10_1016_j_biortech_2023_130206 crossref_primary_10_1371_journal_pgen_1011165 crossref_primary_10_1002_ece3_10389 crossref_primary_10_3390_pathogens12010066 crossref_primary_10_1038_s41438_021_00641_9 crossref_primary_10_1093_g3journal_jkad019 crossref_primary_10_1093_nar_gkad898 crossref_primary_10_21105_joss_04851 crossref_primary_10_3390_jof8101088 crossref_primary_10_46471_gigabyte_44 crossref_primary_10_1093_gbe_evae148 crossref_primary_10_1093_jhered_esad064 crossref_primary_10_1094_PHYTOFR_05_23_0065_A crossref_primary_10_1093_gbe_evae268 crossref_primary_10_3389_fpls_2023_1284478 crossref_primary_10_1093_g3journal_jkac289 crossref_primary_10_1111_imb_12818 crossref_primary_10_1038_s41597_024_03500_z crossref_primary_10_1093_g3journal_jkac164 crossref_primary_10_1186_s12864_024_10810_4 crossref_primary_10_1038_s41597_024_03837_5 crossref_primary_10_1038_s41597_024_03046_0 crossref_primary_10_3389_fgene_2023_1244493 crossref_primary_10_1007_s13592_024_01140_1 crossref_primary_10_1093_nar_gkab1090 crossref_primary_10_3390_ijms24010889 crossref_primary_10_1093_gigascience_giad075 crossref_primary_10_1093_jhered_esae021 crossref_primary_10_46471_gigabyte_51 crossref_primary_10_1094_MPMI_07_22_0154_R crossref_primary_10_1093_g3journal_jkad126 crossref_primary_10_1186_s12915_023_01639_6 crossref_primary_10_3390_biology12071001 crossref_primary_10_1016_j_crbiot_2024_100186 crossref_primary_10_1093_g3journal_jkac276 crossref_primary_10_1093_g3journal_jkae211
Cites_doi	10.1093/bioinformatics/btg1080 10.1093/nar/gkx997 10.1093/bioinformatics/btn460 10.1186/1471-2105-10-421 10.1007/978-1-4939-9173-0_6 10.1093/nar/gky1053 10.1186/s12859-019-3182-x 10.1093/nar/gkw092 10.1186/gb-2008-9-1-r7 10.1093/bioinformatics/btr010 10.1101/gr.1865504 10.1093/database/baw093 10.1093/bioinformatics/btm071 10.1186/1471-2105-15-189 10.1006/jmbi.1997.0951 10.1002/cpbi.57 10.1101/gr.081612.108 10.1093/nar/gkq1019 10.1093/nar/26.4.1107 10.2174/157489308784340702 10.1186/1471-2105-11-S10-O8 10.1038/nmeth.3176 10.1093/nar/gki937 10.1093/nar/gkw1129 10.1093/nar/gku557 10.1093/bioinformatics/btv661 10.1016/0097-8485(93)85004-V 10.1016/j.infsof.2005.09.005 10.1101/gr.10.4.511
ContentType	Journal Article
Copyright	The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2020 The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 2020 – notice: The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. – notice: The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	TOX AAYXX CITATION NPM 8FE 8FH ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS 7X8 5PM
DOI	10.1093/nargab/lqaa026
DatabaseName	Oxford Journals Open Access Collection CrossRef PubMed ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Biological Science Collection Biological Science Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef PubMed Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Biological Science Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) MEDLINE - Academic
DatabaseTitleList	CrossRef Publicly Available Content Database PubMed MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
EISSN	2631-9268
ExternalDocumentID	PMC7222226 32440658 10_1093_nargab_lqaa026 10.1093/nargab/lqaa026
Genre	Journal Article
GrantInformation_xml	– fundername: ; grantid: GM128145
GroupedDBID	0R~ 53G AAFWJ AAPXW AAVAP ABEJV ABGNP ABPTD ABXVV AFPKN AFULF ALMA_UNASSIGNED_HOLDINGS AMNDL EBS EMOBN GROUPED_DOAJ IAO IGS IHR INH ITC KSI M~E ROX RPM TOX AAYXX AFKRA BBNVY BENPR BHPHI CCPQU CITATION HCIFZ M7P PHGZM PHGZT PIMPY NPM 8FE 8FH ABUWG AZQEC DWQXO GNUQQ LK8 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS 7X8 5PM
ID	FETCH-LOGICAL-c3676-818ba61ee2a4bf3175cfae50c2491e614cbd87c873ecb80cbffeb2a2d968464b3
IEDL.DBID	BENPR
ISSN	2631-9268
IngestDate	Thu Aug 21 14:13:35 EDT 2025 Fri Jul 11 08:39:38 EDT 2025 Fri Jul 25 11:54:44 EDT 2025 Wed Feb 19 02:29:50 EST 2025 Tue Jul 01 02:50:14 EDT 2025 Thu Apr 24 23:03:37 EDT 2025 Thu Jan 30 13:18:23 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	2
Language	English
License	This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com https://creativecommons.org/licenses/by-nc/4.0 The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c3676-818ba61ee2a4bf3175cfae50c2491e614cbd87c873ecb80cbffeb2a2d968464b3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
OpenAccessLink	https://www.proquest.com/docview/3170919601?pq-origsite=%requestingapplication%
PMID	32440658
PQID	3170919601
PQPubID	7097362
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_7222226 proquest_miscellaneous_2406307049 proquest_journals_3170919601 pubmed_primary_32440658 crossref_primary_10_1093_nargab_lqaa026 crossref_citationtrail_10_1093_nargab_lqaa026 oup_primary_10_1093_nargab_lqaa026
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2020-06-01
PublicationDateYYYYMMDD	2020-06-01
PublicationDate_xml	– month: 06 year: 2020 text: 2020-06-01 day: 01
PublicationDecade	2020
PublicationPlace	England
PublicationPlace_xml	– name: England – name: Oxford
PublicationTitle	NAR genomics and bioinformatics
PublicationTitleAlternate	NAR Genom Bioinform
PublicationYear	2020
Publisher	Oxford University Press
Publisher_xml	– name: Oxford University Press
References	Hoff (2024030510173379400_B31) 2016; 32 Rodriguez (2024030510173379400_B24) 2018; 46 Borodovsky (2024030510173379400_B26) 1993; 17 Keller (2024030510173379400_B10) 2011; 27 Keilwagen (2024030510173379400_B11) 2016; 44 Gotoh (2024030510173379400_B12) 2014; 15 Foissac (2024030510173379400_B3) 2008; 3 Kiryutin (2024030510173379400_B8) 2007 Ter-Hovhannisyan (2024030510173379400_B21) 2008; 18 Lomsadze (2024030510173379400_B2) 2014; 42 Marchler-Bauer (2024030510173379400_B28) 2017; 45 Sallet (2024030510173379400_B4) 2019; 1962 Gremme (2024030510173379400_B7) 2005; 47 Buchfink (2024030510173379400_B25) 2015; 12 Stanke (2024030510173379400_B29) 2019; 20 Parra (2024030510173379400_B16) 2000; 10 Burge (2024030510173379400_B13) 1997; 268 Kriventseva (2024030510173379400_B23) 2019; 47 Behr (2024030510173379400_B5) 2010; 11 Haas (2024030510173379400_B18) 2008; 9 Camacho (2024030510173379400_B27) 2009; 10 Gotoh (2024030510173379400_B9) 2008; 24 Leinonen (2024030510173379400_B30) 2011; 39 Stanke (2024030510173379400_B15) 2003; 19 Aken (2024030510173379400_B19) 2016; 2016 Birney (2024030510173379400_B6) 2004; 14 Hoff (2024030510173379400_B1) 2019; 65 Souvorov (2024030510173379400_B17) 2010 Lukashin (2024030510173379400_B14) 1998; 26 Parra (2024030510173379400_B22) 2007; 23 Lomsadze (2024030510173379400_B20) 2005; 33
References_xml	– volume: 19 start-page: ii215 issue: Suppl. 2 year: 2003 ident: 2024030510173379400_B15 article-title: Gene prediction with a hidden Markov model and a new intron submodel publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg1080 – volume: 46 start-page: D213 year: 2018 ident: 2024030510173379400_B24 article-title: APPRIS 2017: principal isoforms for multiple gene sets publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkx997 – volume: 24 start-page: 2438 year: 2008 ident: 2024030510173379400_B9 article-title: Direct mapping and alignment of protein sequences onto genomic sequence publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn460 – volume: 10 start-page: 421 year: 2009 ident: 2024030510173379400_B27 article-title: BLAST+: architecture and applications publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-10-421 – volume: 1962 start-page: 97 year: 2019 ident: 2024030510173379400_B4 article-title: EuGene: an automated integrative gene finder for eukaryotes and prokaryotes publication-title: Methods Mol. Biol. doi: 10.1007/978-1-4939-9173-0_6 – volume: 47 start-page: D807 year: 2019 ident: 2024030510173379400_B23 article-title: OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs publication-title: Nucleic Acids Res. doi: 10.1093/nar/gky1053 – volume: 20 start-page: 558 year: 2019 ident: 2024030510173379400_B29 article-title: VARUS: sampling complementary RNA reads from the Sequence Read Archive publication-title: BMC Bioinformatics doi: 10.1186/s12859-019-3182-x – volume: 44 start-page: e89 year: 2016 ident: 2024030510173379400_B11 article-title: Using intron position conservation for homology-based gene prediction publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkw092 – volume: 9 start-page: R7 year: 2008 ident: 2024030510173379400_B18 article-title: Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments publication-title: Genome Biol. doi: 10.1186/gb-2008-9-1-r7 – volume: 27 start-page: 757 year: 2011 ident: 2024030510173379400_B10 article-title: A novel hybrid gene prediction method employing protein multiple sequence alignments publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr010 – volume: 14 start-page: 988 year: 2004 ident: 2024030510173379400_B6 article-title: GeneWise and Genomewise publication-title: Genome Res. doi: 10.1101/gr.1865504 – volume: 2016 start-page: baw093 year: 2016 ident: 2024030510173379400_B19 article-title: The Ensembl gene annotation system publication-title: Database doi: 10.1093/database/baw093 – volume: 23 start-page: 1061 year: 2007 ident: 2024030510173379400_B22 article-title: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm071 – volume: 15 start-page: 189 year: 2014 ident: 2024030510173379400_B12 article-title: Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-15-189 – volume: 268 start-page: 78 year: 1997 ident: 2024030510173379400_B13 article-title: Prediction of complete gene structures in human genomic DNA publication-title: J. Mol. Biol. doi: 10.1006/jmbi.1997.0951 – volume: 65 start-page: e57 year: 2019 ident: 2024030510173379400_B1 article-title: Predicting genes in single genomes with AUGUSTUS publication-title: Curr. Protoc. Bioinformatics doi: 10.1002/cpbi.57 – volume: 18 start-page: 1979 year: 2008 ident: 2024030510173379400_B21 article-title: Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training publication-title: Genome Res. doi: 10.1101/gr.081612.108 – volume: 39 start-page: D19 year: 2011 ident: 2024030510173379400_B30 article-title: The Sequence Read Archive publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkq1019 – volume: 26 start-page: 1107 year: 1998 ident: 2024030510173379400_B14 article-title: GeneMark.hmm: new solutions for gene finding publication-title: Nucleic Acids Res. doi: 10.1093/nar/26.4.1107 – volume: 3 start-page: 87 year: 2008 ident: 2024030510173379400_B3 article-title: Genome annotation in plants and fungi: EuGene as a model platform publication-title: Curr. Bioinformatics doi: 10.2174/157489308784340702 – volume: 11 start-page: O8 year: 2010 ident: 2024030510173379400_B5 article-title: Next generation genome annotation with mGene.ngs publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-11-S10-O8 – volume-title: 11th Annual International Conference in Research in Computational Molecular Biology year: 2007 ident: 2024030510173379400_B8 – volume: 12 start-page: 59 year: 2015 ident: 2024030510173379400_B25 article-title: Fast and sensitive protein alignment using DIAMOND publication-title: Nat. Methods doi: 10.1038/nmeth.3176 – volume: 33 start-page: 6494 year: 2005 ident: 2024030510173379400_B20 article-title: Gene identification in novel eukaryotic genomes by self-training algorithm publication-title: Nucleic Acids Res. doi: 10.1093/nar/gki937 – volume: 45 start-page: D200 year: 2017 ident: 2024030510173379400_B28 article-title: CDD/SPARCLE: functional classification of proteins via subfamily domain architectures publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkw1129 – volume: 42 start-page: e119 year: 2014 ident: 2024030510173379400_B2 article-title: Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm publication-title: Nucleic Acids Res. doi: 10.1093/nar/gku557 – volume: 32 start-page: 767 year: 2016 ident: 2024030510173379400_B31 article-title: BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv661 – volume: 17 start-page: 123 year: 1993 ident: 2024030510173379400_B26 article-title: GenMark: parallel gene recognition for both DNA strands publication-title: Comput. Chem. doi: 10.1016/0097-8485(93)85004-V – volume: 47 start-page: 965 year: 2005 ident: 2024030510173379400_B7 article-title: Engineering a software tool for gene structure prediction in higher organisms publication-title: Inform. Software Technol. doi: 10.1016/j.infsof.2005.09.005 – volume-title: National Center for Biotechnology Information year: 2010 ident: 2024030510173379400_B17 article-title: Gnomon:NCBI eukaryotic gene prediction tool – volume: 10 start-page: 511 year: 2000 ident: 2024030510173379400_B16 article-title: GeneID in Drosophila publication-title: Genome Res. doi: 10.1101/gr.10.4.511
SSID	ssj0002545401
Score	2.5957918
Snippet	We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method...
SourceID	pubmedcentral proquest pubmed crossref oup
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	lqaa026
SubjectTerms	Accuracy Gene mapping Genes Genomes Methart Peptide mapping Predictions
Title	GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins
URI	https://www.ncbi.nlm.nih.gov/pubmed/32440658 https://www.proquest.com/docview/3170919601 https://www.proquest.com/docview/2406307049 https://pubmed.ncbi.nlm.nih.gov/PMC7222226
Volume	2
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1La9tAEF4a55JLSGiTOg-zDYEcymLruXIupS02IZA0FBt8iphdrYiJWdlWfMi_z4y0Vu1AWx100QhpZ1Yz38xqv2HsEqSKfBX3hQIIRRgYX4CWuSAurTjzPIgi2uB8dx_fjMPbSTRxBbfS_Va59omVo84KTTXyLsY5DG2It71v84WgrlG0uupaaOywXXTBSdJiuz8G9w-_myoLpj8ISbyGrTHoWuofq7qzBUDFqLARjbZ2uG0Azff_S24EoOEB23fIkX-vTX3IPhj7kT0SbTTttxGDh6_X3KyeYflaoATHiWH4fEnrMKR7TgVXXppZLtZdIfjUcoR_HH2KNrzIq1tKDjbjFXvD1Jaf2Hg4GP28Ea5lgtBEvSYw_CqIPWN8CFVO2EDnYKKexizLMxiKtcoSqRMZGK2SnlZ5jqk1-Fk_RiASquCItWxhzWfGJSoUsjjyEo1ZhDQQKwQLUgIaFDFgv83EWnWpdnziNIBZWq9rB2mt6tSpus2uGvl5zaTxV8kLtMR_hc7WhkrdZ1emfyZJm31pLuMHQ6sgYE2xKlOCMOToQhzBcW3X5lGILkPCZG0mtyzeCBAZ9_YVO32qSLmlT0d88u_XOmV7PiXsVRnnjLVelitzjqjmRXXc1O1UVYFOVXbC8-jX5A2Tqf9l
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fb9MwED6N7gFeEIhfhcEMAvGArDZOYrdIE2LQqWNbNaFN2hOZ7TiionK6ZhXaP8XfuLv8okUCntbXXNrGd_b33Tn-DuC1ViYWRg650TriUegE11ZlnLS0ZBoEOo7pgPPRRI5Poy9n8dkG_GrOwtBrlc2aWC7UaW6pRt5DnENoQ74dfJhfcOoaRburTQuNKiwO3NVPTNmKnf3P6N83QuyNTj6Ned1VgFtSJ-OIUEbLwDmhI5MRfNpMu7hvMREJHKKVNelA2YEKnTWDvjVZhtmnFulQIlZHJsTvvQWbUSj7ogObu6PJ8de2qoPpFlKgoFWHDHue-tWa3uxC61LBYQX91k7UrRDbP9_PXAG8vXtwt2aq7GMVWvdhw_kH8I1kqul8Dx8dv3vP3PKHXlzlaMEwEB2bL2jfh3zNqMDLCjfLeNOFgk09Q7rJcA2zjuVZeUvBtE9ZqRYx9cVDOL2RwXwEHZ979wSYQqTUqYyDgcWsRTktDZITpTQGEHLOYRd4M3SJrfXL6QFmSbWPHibVUCf1UHfhbWs_r5Q7_mr5Cj3xX6OtxlFJPc2L5HdQduFlexknKO26aO_yZZEQZaKFNcIneFz5tf0pZLMRccAuqDWPtwYk_r1-xU-_lyLgStBHPv3339qG2-OTo8PkcH9y8AzuCCoWlCWkLehcLpbuOTKqS_OiDmMG5zc9c64Bpdg6XA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GeneMark-EP%2B%3A+eukaryotic+gene+prediction+with+self-training+in+the+space+of+genes+and+proteins&rft.jtitle=NAR+genomics+and+bioinformatics&rft.au=Br%C5%AFna%2C+Tom%C3%A1%C5%A1&rft.au=Lomsadze%2C+Alexandre&rft.au=Borodovsky%2C+Mark&rft.date=2020-06-01&rft.eissn=2631-9268&rft.volume=2&rft.issue=2&rft.spage=lqaa026&rft_id=info:doi/10.1093%2Fnargab%2Flqaa026&rft_id=info%3Apmid%2F32440658&rft.externalDocID=32440658
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2631-9268&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2631-9268&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2631-9268&client=summon