DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 37; no. 15; pp. 2112 - 2120
Main Authors Ji, Yanrong, Zhou, Zhihan, Liu, Han, Davuluri, Ramana V
Format Journal Article
LanguageEnglish
Published England Oxford University Press 09.08.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.
Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios.MOTIVATIONDeciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios.To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks.RESULTSTo address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks.The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT).AVAILABILITY AND IMPLEMENTATIONThe source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT).Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary data are available at Bioinformatics online.
Author Ji, Yanrong
Liu, Han
Davuluri, Ramana V
Zhou, Zhihan
Author_xml – sequence: 1
  givenname: Yanrong
  surname: Ji
  fullname: Ji, Yanrong
– sequence: 2
  givenname: Zhihan
  surname: Zhou
  fullname: Zhou, Zhihan
– sequence: 3
  givenname: Han
  surname: Liu
  fullname: Liu, Han
  email: hanliu@northwestern.edu
– sequence: 4
  givenname: Ramana V
  orcidid: 0000-0002-7053-1064
  surname: Davuluri
  fullname: Davuluri, Ramana V
  email: ramana.davuluri@stonybrookmedicine.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33538820$$D View this record in MEDLINE/PubMed
BookMark eNqNUUtP3DAQthBVefUvIB97Cdhx4nWqSgjo9iEhkND2bI2dyeIqsRc7QeLf16vdrgoXOHms-R4z8x2RfR88EnLK2RlnjTg3LjjfhTjA6Gw6NyMYpsQeOeRCzopKcb6_q5k4IEcp_WGM1ayWH8mBELVQqmSHZPh2e3k1v198oauIxRjBeWzplWtdRDu64KGnc29Di5HeY8Yk9COsG4l2MQx0EcGn9SAYEx0yrqf5R7Ns0YNfTrBE6jxdog8DnpAPHfQJP23fY_L7-3xx_bO4ufvx6_ryprB1KcZCVY2QCrlirDOgDGugVFDVEhmXSszaGZRNZ2rboTRNCRXPy3fclBWzjZFGHJOLje5qMgO2Ns8coder6AaIzzqA0y873j3oZXjSnLOylrXKCp-3CjE8TphGPbhksc87YZiSLislKykaNsvQ0__Ndi7_jpwBcgOwMaQUsdtBONPrNPXLNPU2zUz8-opo3eb466D6t-l8Qw_T6r2WfwF-CcGs
CitedBy_id crossref_primary_10_1093_nar_gkac824
crossref_primary_10_1111_tpj_70047
crossref_primary_10_1093_bib_bbae641
crossref_primary_10_1126_sciadv_adk4670
crossref_primary_10_1038_s41587_024_02353_6
crossref_primary_10_1016_j_jbi_2022_104231
crossref_primary_10_1093_bioinformatics_btad468
crossref_primary_10_51300_jidm_2022_58
crossref_primary_10_2174_0115665232268074231026111634
crossref_primary_10_1093_bib_bbad307
crossref_primary_10_3390_biology13100755
crossref_primary_10_1016_j_neunet_2023_12_002
crossref_primary_10_1371_journal_pcbi_1010779
crossref_primary_10_1016_j_csbj_2023_11_025
crossref_primary_10_1186_s12911_024_02600_5
crossref_primary_10_3390_genes13122323
crossref_primary_10_1038_s41587_024_02414_w
crossref_primary_10_1186_s12859_023_05577_6
crossref_primary_10_1186_s13040_024_00410_z
crossref_primary_10_1038_s41598_024_72512_x
crossref_primary_10_1126_science_ado9336
crossref_primary_10_3390_genes16030284
crossref_primary_10_1101_gr_279142_124
crossref_primary_10_1186_s13059_023_02955_4
crossref_primary_10_1093_bioinformatics_btae326
crossref_primary_10_1016_j_tig_2024_11_013
crossref_primary_10_1186_s12859_024_05869_5
crossref_primary_10_1016_j_jgg_2024_12_016
crossref_primary_10_1016_j_heliyon_2024_e31626
crossref_primary_10_1038_s42256_024_00872_0
crossref_primary_10_1016_j_crmeth_2024_100707
crossref_primary_10_1038_s41592_024_02523_z
crossref_primary_10_1093_bib_bbac204
crossref_primary_10_1021_acsomega_3c05913
crossref_primary_10_1038_s41467_025_55920_z
crossref_primary_10_3389_fgene_2022_1067562
crossref_primary_10_1073_pnas_2311219120
crossref_primary_10_2196_49724
crossref_primary_10_1093_jamia_ocaf029
crossref_primary_10_3390_v16111673
crossref_primary_10_1016_j_crmeth_2022_100384
crossref_primary_10_3390_ijms242115858
crossref_primary_10_1088_2632_2153_acb488
crossref_primary_10_1093_nar_gkae429
crossref_primary_10_1021_acs_jcim_4c01097
crossref_primary_10_1038_s42256_024_00836_4
crossref_primary_10_3389_fgene_2022_885627
crossref_primary_10_1016_j_neunet_2024_107040
crossref_primary_10_1093_nar_gkad578
crossref_primary_10_2196_59505
crossref_primary_10_3389_fbrio_2024_1326958
crossref_primary_10_1016_j_csbr_2024_100003
crossref_primary_10_1093_bib_bbad210
crossref_primary_10_1093_bioinformatics_btad248
crossref_primary_10_1109_JBHI_2024_3349584
crossref_primary_10_1016_j_cell_2023_02_018
crossref_primary_10_1016_j_trd_2025_104644
crossref_primary_10_1093_bib_bbad208
crossref_primary_10_3389_fgene_2022_1081842
crossref_primary_10_3389_fmicb_2024_1516667
crossref_primary_10_3390_genes15081090
crossref_primary_10_3389_fimmu_2024_1357217
crossref_primary_10_1007_s10142_024_01417_9
crossref_primary_10_1093_bioinformatics_btae461
crossref_primary_10_1016_j_omtn_2024_102192
crossref_primary_10_1109_TCBB_2024_3459870
crossref_primary_10_1002_1878_0261_13745
crossref_primary_10_1109_TCBB_2023_3323295
crossref_primary_10_1186_s12864_024_10885_z
crossref_primary_10_1016_j_csbj_2025_03_024
crossref_primary_10_1093_bib_bbae651
crossref_primary_10_3390_a15080274
crossref_primary_10_1063_5_0249920
crossref_primary_10_3390_foods11223742
crossref_primary_10_1002_mef2_96
crossref_primary_10_1093_bib_bbac598
crossref_primary_10_2298_CSIS240314049L
crossref_primary_10_3390_biomedinformatics4020085
crossref_primary_10_1093_bib_bbad442
crossref_primary_10_1016_j_vaccine_2023_07_024
crossref_primary_10_1016_j_heliyon_2024_e28443
crossref_primary_10_1093_bib_bbad438
crossref_primary_10_3389_fnins_2022_846638
crossref_primary_10_1093_bioadv_vbac023
crossref_primary_10_1016_j_compbiomed_2023_107077
crossref_primary_10_3389_frai_2024_1424012
crossref_primary_10_1016_j_omtn_2024_102255
crossref_primary_10_1016_j_ymeth_2024_12_006
crossref_primary_10_1038_s41467_024_46947_9
crossref_primary_10_1093_bib_bbad193
crossref_primary_10_1016_j_cels_2023_05_007
crossref_primary_10_1016_j_compbiomed_2024_108466
crossref_primary_10_1080_15476286_2024_2315384
crossref_primary_10_3390_genes15010034
crossref_primary_10_48084_etasr_6295
crossref_primary_10_1038_s41467_025_56330_x
crossref_primary_10_1093_bib_bbae163
crossref_primary_10_3390_app13126996
crossref_primary_10_1186_s12864_023_09802_7
crossref_primary_10_1186_s12863_023_01123_8
crossref_primary_10_1186_s13059_022_02780_1
crossref_primary_10_1186_s40246_023_00513_4
crossref_primary_10_1016_j_omtn_2024_102370
crossref_primary_10_1038_s42256_025_01007_9
crossref_primary_10_1021_acs_jcim_3c02070
crossref_primary_10_1186_s12859_023_05303_2
crossref_primary_10_2197_ipsjtbio_16_20
crossref_primary_10_3390_biom14070767
crossref_primary_10_1016_j_isci_2024_109334
crossref_primary_10_1016_j_scitotenv_2024_172466
crossref_primary_10_1017_eds_2023_37
crossref_primary_10_1016_j_compbiomed_2024_108230
crossref_primary_10_1016_j_compbiomed_2024_109440
crossref_primary_10_1007_s12539_024_00661_8
crossref_primary_10_1093_bib_bbae157
crossref_primary_10_1093_bioinformatics_btac509
crossref_primary_10_1109_JBHI_2024_3354121
crossref_primary_10_1186_s12915_024_01923_z
crossref_primary_10_1016_j_heliyon_2024_e39140
crossref_primary_10_1016_j_drudis_2024_103990
crossref_primary_10_1016_j_immuno_2024_100040
crossref_primary_10_1186_s13059_023_02934_9
crossref_primary_10_1186_s13059_024_03379_4
crossref_primary_10_1016_j_jisa_2024_103953
crossref_primary_10_1109_RBME_2024_3496744
crossref_primary_10_1016_j_csbj_2025_03_007
crossref_primary_10_1016_j_compeleceng_2024_109786
crossref_primary_10_1038_s41592_021_01252_x
crossref_primary_10_1016_j_eswa_2023_120439
crossref_primary_10_1186_s12859_023_05469_9
crossref_primary_10_15302_J_QB_022_0315
crossref_primary_10_3390_cells12081191
crossref_primary_10_1016_j_rineng_2024_103476
crossref_primary_10_1038_s43588_023_00544_w
crossref_primary_10_1016_j_jai_2025_03_004
crossref_primary_10_1093_bib_bbae702
crossref_primary_10_1016_j_isci_2024_111658
crossref_primary_10_1093_nar_gkad055
crossref_primary_10_26508_lsa_202301962
crossref_primary_10_3390_biomedinformatics4030101
crossref_primary_10_3390_genes15040404
crossref_primary_10_1016_j_xcrp_2023_101600
crossref_primary_10_3390_ijms25052869
crossref_primary_10_1093_bib_bbad093
crossref_primary_10_1038_s44222_025_00280_y
crossref_primary_10_1016_j_future_2024_107601
crossref_primary_10_1093_bioinformatics_btae013
crossref_primary_10_3390_genes15121593
crossref_primary_10_14778_3611479_3611537
crossref_primary_10_1093_nargab_lqad082
crossref_primary_10_1093_database_baac036
crossref_primary_10_1109_TCBB_2022_3204661
crossref_primary_10_1093_nar_gkae1310
crossref_primary_10_1186_s12864_021_08246_1
crossref_primary_10_1186_s12859_023_05573_w
crossref_primary_10_1016_j_cell_2024_11_015
crossref_primary_10_1038_s41598_024_77172_5
crossref_primary_10_3389_frai_2023_1128153
crossref_primary_10_1109_JBHI_2023_3288768
crossref_primary_10_1016_j_compbiomed_2024_108376
crossref_primary_10_1093_bioinformatics_btad617
crossref_primary_10_1093_nar_gkae912
crossref_primary_10_1007_s13721_024_00463_4
crossref_primary_10_1093_bfgp_elae009
crossref_primary_10_3389_fsysb_2024_1402664
crossref_primary_10_1109_TCBB_2022_3165592
crossref_primary_10_1371_journal_pcbi_1011162
crossref_primary_10_1016_j_copbio_2023_102941
crossref_primary_10_1093_bib_bbae170
crossref_primary_10_1038_s44222_024_00245_7
crossref_primary_10_1038_s41576_022_00532_2
crossref_primary_10_1093_bioinformatics_btae031
crossref_primary_10_1016_j_compbiomed_2022_105993
crossref_primary_10_3389_frnar_2024_1473293
crossref_primary_10_1038_s41598_024_84105_9
crossref_primary_10_1016_j_compbiolchem_2024_108129
crossref_primary_10_1038_s41592_024_02359_7
crossref_primary_10_1016_j_ymeth_2024_01_011
crossref_primary_10_1186_s12859_023_05352_7
crossref_primary_10_7717_peerj_16600
crossref_primary_10_1002_2211_5463_70003
crossref_primary_10_1186_s13059_024_03320_9
crossref_primary_10_3390_ijms26041723
crossref_primary_10_14348_molcells_2023_2157
crossref_primary_10_1093_bioinformatics_btaf004
crossref_primary_10_1049_enb2_12025
crossref_primary_10_1177_14727978251321951
crossref_primary_10_1016_j_cbpa_2021_04_008
crossref_primary_10_1016_j_csbj_2021_05_039
crossref_primary_10_3390_genes13111952
crossref_primary_10_1186_s13073_023_01238_8
crossref_primary_10_1186_s12859_022_04985_4
crossref_primary_10_3390_s24113553
crossref_primary_10_1016_j_gene_2024_148330
crossref_primary_10_3389_fbinf_2022_910531
crossref_primary_10_1016_j_molp_2024_12_006
crossref_primary_10_1093_bioadv_vbad043
crossref_primary_10_3390_ijms252312942
crossref_primary_10_1186_s13059_024_03449_7
crossref_primary_10_1016_j_compbiolchem_2023_107905
crossref_primary_10_1016_j_isci_2024_109257
crossref_primary_10_1186_s12859_022_05000_6
crossref_primary_10_3389_fnagi_2022_1027224
crossref_primary_10_1093_bib_bbae599
crossref_primary_10_1089_apb_2023_0020
crossref_primary_10_1093_bioadv_vbae016
crossref_primary_10_1093_bioinformatics_btae046
crossref_primary_10_1093_bib_bbad147
crossref_primary_10_1038_s41422_024_01034_y
crossref_primary_10_1016_j_jpha_2025_101255
crossref_primary_10_52601_bpr_2024_240006
crossref_primary_10_1093_bioinformatics_btaf018
crossref_primary_10_1371_journal_pone_0301791
crossref_primary_10_1038_s41467_024_53759_4
crossref_primary_10_1093_nsr_nwae355
crossref_primary_10_1016_j_xcrm_2024_101608
crossref_primary_10_1093_gigascience_giad054
crossref_primary_10_1007_s00439_024_02722_w
crossref_primary_10_1016_j_isci_2023_108592
crossref_primary_10_3389_fgene_2024_1444459
crossref_primary_10_1093_nargab_lqae129
crossref_primary_10_48130_gcomm_0025_0003
crossref_primary_10_1101_gad_351800_124
crossref_primary_10_3390_ijms26052281
crossref_primary_10_1002_advs_202407013
crossref_primary_10_1109_TCBB_2023_3339597
crossref_primary_10_1093_nar_gkad1031
crossref_primary_10_1038_s41467_022_34152_5
crossref_primary_10_1038_s12276_024_01243_w
crossref_primary_10_1038_s41588_025_02121_5
crossref_primary_10_1038_s41576_024_00774_2
crossref_primary_10_1111_eci_14183
crossref_primary_10_1186_s12967_024_05567_z
crossref_primary_10_2174_0115748936283134240109054157
crossref_primary_10_1016_j_heliyon_2024_e41488
crossref_primary_10_1093_bib_bbad040
crossref_primary_10_7717_peerj_13666
crossref_primary_10_3389_fmicb_2023_1331233
crossref_primary_10_3389_fgene_2023_1164593
crossref_primary_10_1093_bib_bbae138
crossref_primary_10_1186_s12859_022_04647_5
crossref_primary_10_1093_bioinformatics_btae188
crossref_primary_10_1093_nargab_lqad021
crossref_primary_10_1146_annurev_genom_021623_024727
crossref_primary_10_1016_j_ab_2024_115492
crossref_primary_10_1016_j_compbiolchem_2024_108040
crossref_primary_10_3390_ijms252010928
crossref_primary_10_1016_j_biosystems_2023_105095
crossref_primary_10_1093_bioinformatics_btaf041
crossref_primary_10_1371_journal_pcbi_1010028
crossref_primary_10_1016_j_compbiomed_2023_107238
crossref_primary_10_1093_bioinformatics_btae196
crossref_primary_10_1016_j_biotechadv_2024_108399
crossref_primary_10_3934_mbe_2024264
crossref_primary_10_3389_fgene_2024_1494474
crossref_primary_10_3390_sym15030731
crossref_primary_10_1101_gr_278870_123
crossref_primary_10_1016_j_websem_2024_100845
crossref_primary_10_1093_bib_bbae560
crossref_primary_10_1186_s12863_024_01293_z
crossref_primary_10_1016_j_compbiomed_2024_108189
crossref_primary_10_2339_politeknik_1509329
crossref_primary_10_1093_bib_bbae324
crossref_primary_10_1111_cobi_14411
crossref_primary_10_1126_science_adt3007
crossref_primary_10_1093_nar_gkad436
crossref_primary_10_1021_acs_jcim_4c01118
crossref_primary_10_1038_s42003_023_05310_2
crossref_primary_10_1093_bib_bbaf092
crossref_primary_10_1093_nar_gkac1247
crossref_primary_10_1093_bioinformatics_btad541
crossref_primary_10_1021_acsomega_3c05571
crossref_primary_10_1016_j_isci_2025_112081
crossref_primary_10_1093_bioinformatics_btaf051
crossref_primary_10_1371_journal_pcbi_1012755
crossref_primary_10_1016_j_neunet_2024_106978
crossref_primary_10_1109_ACCESS_2024_3367801
crossref_primary_10_1128_msystems_01258_24
crossref_primary_10_3390_life11111135
crossref_primary_10_3390_electronics13214322
crossref_primary_10_1016_j_tplants_2024_04_013
crossref_primary_10_1093_nar_gkae099
crossref_primary_10_1016_j_ijbiomac_2024_130659
crossref_primary_10_3390_ijms252111744
crossref_primary_10_1016_j_yamp_2024_08_001
crossref_primary_10_1093_bioinformatics_btae529
crossref_primary_10_1007_s12539_024_00665_4
crossref_primary_10_1093_bioinformatics_btae640
crossref_primary_10_1093_bib_bbad223
crossref_primary_10_1371_journal_pcbi_1012744
crossref_primary_10_1109_TCBB_2023_3237769
crossref_primary_10_1016_j_bios_2025_117399
crossref_primary_10_1016_j_ijbiomac_2025_140630
crossref_primary_10_1093_bib_bbae548
crossref_primary_10_3389_frai_2022_1040295
crossref_primary_10_7717_peerj_cs_1340
crossref_primary_10_1093_nar_gkae783
crossref_primary_10_1016_j_xgen_2025_100762
crossref_primary_10_1002_qub2_69
crossref_primary_10_1093_bib_bbad250
crossref_primary_10_3390_computers14030085
crossref_primary_10_46989_001c_124131
crossref_primary_10_1007_s11432_024_4171_9
crossref_primary_10_1016_j_ymeth_2025_01_014
crossref_primary_10_1038_s42003_023_04866_3
crossref_primary_10_1093_nar_gkac326
crossref_primary_10_3389_fgene_2023_1254827
crossref_primary_10_1002_mef2_43
crossref_primary_10_1093_comjnl_bxae018
crossref_primary_10_1186_s12859_024_05891_7
crossref_primary_10_1016_j_jmb_2024_168856
crossref_primary_10_1038_s41467_023_42547_1
crossref_primary_10_3390_fermentation10020093
crossref_primary_10_1109_TCBB_2023_3283985
crossref_primary_10_1088_2632_2153_acb2b2
crossref_primary_10_1016_j_csbj_2024_09_031
crossref_primary_10_3390_genes15050629
crossref_primary_10_1128_spectrum_01350_23
crossref_primary_10_1111_tpj_17190
crossref_primary_10_2174_0113894501330963240905083020
crossref_primary_10_1093_genetics_iyae136
crossref_primary_10_1186_s13040_024_00414_9
crossref_primary_10_1016_j_compbiomed_2023_107260
crossref_primary_10_1016_j_patter_2024_101150
crossref_primary_10_1371_journal_pcbi_1010238
crossref_primary_10_3390_ijms232012385
crossref_primary_10_1038_s42003_024_06161_1
crossref_primary_10_1093_bib_bbae210
crossref_primary_10_3390_ijms25094990
crossref_primary_10_1007_s12539_022_00537_9
crossref_primary_10_1093_bioadv_vbae117
crossref_primary_10_1111_1755_0998_14006
crossref_primary_10_1186_s12915_024_02055_0
crossref_primary_10_1371_journal_pcbi_1012525
crossref_primary_10_1093_bioinformatics_btaf085
crossref_primary_10_1093_nsr_nwaf028
crossref_primary_10_1016_j_omtn_2023_06_019
crossref_primary_10_1145_3715318
crossref_primary_10_1016_j_jechem_2024_11_011
crossref_primary_10_1016_j_jneumeth_2025_110363
crossref_primary_10_1093_bib_bbac027
crossref_primary_10_1186_s12859_024_05849_9
crossref_primary_10_1186_s13059_024_03221_x
crossref_primary_10_71423_aimed_20250102
Cites_doi 10.1038/s41598-019-56894-x
10.1073/pnas.242597299
10.1038/nature11247
10.1093/nar/gky1120
10.2196/14830
10.1093/nar/gkt1113
10.1186/gb-2012-13-8-418
10.1038/s41576-019-0173-8
10.1109/TPAMI.2013.50
10.1073/pnas.53.5.1161
10.1038/nrg3813
10.1093/bioinformatics/bty1068
10.1093/nar/gkr028
10.1093/bioinformatics/btu273
10.1038/nature01262
10.1093/nar/gks1233
10.1186/gb-2007-8-2-r24
10.1038/s41588-018-0295-5
10.1111/j.1749-6632.1999.tb08916.x
10.1038/nprot.2017.055
10.1158/1541-7786.MCR-16-0459
10.1016/j.tig.2008.01.008
10.1103/PhysRevLett.73.3169
10.1101/gr.200535.115
10.1016/S0092-8674(03)00348-9
10.1038/nmeth.2688
10.1038/nature14539
10.1038/nature01255
10.1007/s13042-019-00990-x
10.1093/bioinformatics/btw288
10.1038/nmeth.3547
10.1093/bioinformatics/btz682
10.1038/nbt.3300
10.1186/gb-2006-7-s1-s10
10.1261/rna.876308
10.1093/nar/gkw226
10.1093/nar/29.1.308
10.1093/nar/gky237
10.1136/jmg.2006.045377
10.3410/B4-8
10.1162/neco.1997.9.8.1735
10.1186/s12859-019-3306-3
10.1093/nar/12.5.2561
10.1016/S0092-8240(87)90018-8
10.3389/fgene.2019.00286
ContentType Journal Article
Copyright The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021
The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Copyright_xml – notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021
– notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
DBID AAYXX
CITATION
NPM
7X8
5PM
DOI 10.1093/bioinformatics/btab083
DatabaseName CrossRef
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
EndPage 2120
ExternalDocumentID PMC11025658
33538820
10_1093_bioinformatics_btab083
10.1093/bioinformatics/btab083
Genre Journal Article
GrantInformation_xml – fundername: NLM NIH HHS
  grantid: R01 LM011297
– fundername: NLM NIH HHS
  grantid: R01 LM013722
– fundername: ;
– fundername: ;
  grantid: R01LM011297
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
CITATION
NPM
7X8
5PM
ID FETCH-LOGICAL-c523t-849368e1800fba8b09a28a456e016837d7a29fb5cfe6b92a41b08f1b240c9b6b3
IEDL.DBID TOX
ISSN 1367-4803
1367-4811
IngestDate Thu Aug 21 18:34:09 EDT 2025
Fri Jul 11 02:12:13 EDT 2025
Mon Jul 21 05:56:30 EDT 2025
Thu Apr 24 23:05:30 EDT 2025
Tue Jul 01 02:33:54 EDT 2025
Wed Apr 02 07:06:57 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 15
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c523t-849368e1800fba8b09a28a456e016837d7a29fb5cfe6b92a41b08f1b240c9b6b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
ORCID 0000-0002-7053-1064
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/11025658
PMID 33538820
PQID 2486463907
PQPubID 23479
PageCount 9
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_11025658
proquest_miscellaneous_2486463907
pubmed_primary_33538820
crossref_primary_10_1093_bioinformatics_btab083
crossref_citationtrail_10_1093_bioinformatics_btab083
oup_primary_10_1093_bioinformatics_btab083
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-Aug-09
PublicationDateYYYYMMDD 2021-08-09
PublicationDate_xml – month: 08
  year: 2021
  text: 2021-Aug-09
  day: 09
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2021
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Umarov (2024041009302593200_btab083-B47) 2019; 35
Bengio (2024041009302593200_btab083-B4) 2013; 35
Bartlett (2024041009302593200_btab083-B3) 2017; 12
Oubounyt (2024041009302593200_btab083-B40) 2019; 10
Zhou (2024041009302593200_btab083-B56) 2015; 12
Koeppel (2024041009302593200_btab083-B27) 2011; 39
Cho (2024041009302593200_btab083-B8) 2014
Quang (2024041009302593200_btab083-B41) 2016; 44
Devlin (2024041009302593200_btab083-B13) 2018
Min (2024041009302593200_btab083-B37) 2019
Wang (2024041009302593200_btab083-B51) 2008; 14
Searls (2024041009302593200_btab083-B42) 1992; 80
Jaijo (2024041009302593200_btab083-B22) 2006; 44
Head (2024041009302593200_btab083-B20) 1987; 49
Brendel (2024041009302593200_btab083-B5) 1984; 12
Buenrostro (2024041009302593200_btab083-B6) 2013; 10
Li (2024041009302593200_btab083-B32) 2019; 7
Gerstberger (2024041009302593200_btab083-B16) 2014; 15
Dunham (2024041009302593200_btab083-B15) 2012; 489
Landrum (2024041009302593200_btab083-B28) 2014; 42
Nirenberg (2024041009302593200_btab083-B39) 1965; 53
Khamis (2024041009302593200_btab083-B26) 2018; 46
Waterston (2024041009302593200_btab083-B52) 2002; 420
Liu (2024041009302593200_btab083-B35) 2019
Shen (2024041009302593200_btab083-B44) 2018; 8
Yoon (2024041009302593200_btab083-B54) 2002; 99
Gupta (2024041009302593200_btab083-B18) 2007; 8
Davuluri (2024041009302593200_btab083-B11) 2003; 29
Dreos (2024041009302593200_btab083-B14) 2013; 41
Mantegna (2024041009302593200_btab083-B36) 1994; 73
Leslie (2024041009302593200_btab083-B31) 2014; 30
Ji (2024041009302593200_btab083-B23) 1999; 870
Searls (2024041009302593200_btab083-B43) 2002; 420
Lee (2024041009302593200_btab083-B30) 2020; 36
Clauwaert (2024041009302593200_btab083-B9) 2020
Liang (2024041009302593200_btab083-B34) 2018; 16
Cosma (2024041009302593200_btab083-B10) 2003; 113
Li (2024041009302593200_btab083-B33) 2016; 32
Andersson (2024041009302593200_btab083-B2) 2020; 21
Vitting-Seerup (2024041009302593200_btab083-B49) 2017; 15
Davuluri (2024041009302593200_btab083-B12) 2008; 24
Buniello (2024041009302593200_btab083-B7) 2019; 47
Ji (2024041009302593200_btab083-B24) 2020; 10
Wang (2024041009302593200_btab083-B50) 2019; 20
Hochreiter (2024041009302593200_btab083-B21) 1997; 9
Kelley (2024041009302593200_btab083-B25) 2016; 26
Zhang (2024041009302593200_btab083-B55) 2020; 11
Zou (2024041009302593200_btab083-B57) 2019; 51
Alipanahi (2024041009302593200_btab083-B1) 2015; 33
Vaswani (2024041009302593200_btab083-B48) 2017
LeCun (2024041009302593200_btab083-B29) 2015; 521
Sherry (2024041009302593200_btab083-B45) 2001; 29
Yang (2024041009302593200_btab083-B53) 2019
Mouse (2024041009302593200_btab083-B38) 2012; 13
Gibcus (2024041009302593200_btab083-B17) 2012; 4
Hassanzadeh (2024041009302593200_btab083-B19) 2016
Solovyev (2024041009302593200_btab083-B46) 2006; 7
References_xml – volume: 10
  start-page: 134
  year: 2020
  ident: 2024041009302593200_btab083-B24
  article-title: In silico analysis of alternative splicing on drug–target gene interactions
  publication-title: Sci. Rep
  doi: 10.1038/s41598-019-56894-x
– year: 2019
  ident: 2024041009302593200_btab083-B37
– volume: 99
  start-page: 15632
  year: 2002
  ident: 2024041009302593200_btab083-B54
  article-title: Gene expression profiling of isogenic cells with different TP53 gene dosage reveals numerous genes that are affected by TP53 dosage and identifies CSPG2 as a direct target of p53
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.242597299
– volume: 489
  start-page: 57
  year: 2012
  ident: 2024041009302593200_btab083-B15
  article-title: An integrated encyclopedia of DNA elements in the human genome
  publication-title: Nature
  doi: 10.1038/nature11247
– volume: 47
  start-page: D1005
  year: 2019
  ident: 2024041009302593200_btab083-B7
  article-title: The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky1120
– volume: 7
  start-page: e14830
  year: 2019
  ident: 2024041009302593200_btab083-B32
  article-title: Fine-tuning bidirectional encoder representations from transformers (BERT)-based models on large-scale electronic health record notes: an empirical study
  publication-title: JMIR Med. Inform
  doi: 10.2196/14830
– volume: 42
  start-page: D980
  year: 2014
  ident: 2024041009302593200_btab083-B28
  article-title: ClinVar: public archive of relationships among sequence variation and human phenotype
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkt1113
– year: 2014
  ident: 2024041009302593200_btab083-B8
– start-page: 178
  year: 2016
  ident: 2024041009302593200_btab083-B19
– volume: 13
  start-page: 418
  year: 2012
  ident: 2024041009302593200_btab083-B38
  article-title: An encyclopedia of mouse DNA elements (Mouse ENCODE)
  publication-title: Genome Biol
  doi: 10.1186/gb-2012-13-8-418
– start-page: 6000
  year: 2017
  ident: 2024041009302593200_btab083-B48
– volume: 21
  start-page: 71
  year: 2020
  ident: 2024041009302593200_btab083-B2
  article-title: Determinants of enhancer and promoter activities of regulatory elements
  publication-title: Nat. Rev. Genet
  doi: 10.1038/s41576-019-0173-8
– volume: 35
  start-page: 1798
  year: 2013
  ident: 2024041009302593200_btab083-B4
  article-title: Representation learning: a review and new perspectives
  publication-title: IEEE Trans. Pattern Anal
  doi: 10.1109/TPAMI.2013.50
– volume: 53
  start-page: 1161
  year: 1965
  ident: 2024041009302593200_btab083-B39
  article-title: RNA codewords and protein synthesis, VII. On the general nature of the RNA code
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.53.5.1161
– volume: 15
  start-page: 829
  year: 2014
  ident: 2024041009302593200_btab083-B16
  article-title: A census of human RNA-binding proteins
  publication-title: Nat. Rev. Genet
  doi: 10.1038/nrg3813
– volume: 35
  start-page: 2730
  year: 2019
  ident: 2024041009302593200_btab083-B47
  article-title: Promoter analysis and prediction in the human genome using sequence-based deep learning models
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty1068
– volume: 39
  start-page: 6069
  year: 2011
  ident: 2024041009302593200_btab083-B27
  article-title: Crosstalk between c-Jun and TAp73alpha/beta contributes to the apoptosis-survival balance
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkr028
– volume: 30
  start-page: i185
  year: 2014
  ident: 2024041009302593200_btab083-B31
  article-title: GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu273
– volume: 420
  start-page: 520
  year: 2002
  ident: 2024041009302593200_btab083-B52
  article-title: Initial sequencing and comparative analysis of the mouse genome
  publication-title: Nature
  doi: 10.1038/nature01262
– volume: 41
  start-page: D157
  year: 2013
  ident: 2024041009302593200_btab083-B14
  article-title: EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks1233
– volume: 80
  start-page: 579
  year: 1992
  ident: 2024041009302593200_btab083-B42
  article-title: The linguistics of DNA
  publication-title: Am. Sci
– volume: 8
  start-page: R24
  year: 2007
  ident: 2024041009302593200_btab083-B18
  article-title: Quantifying similarity between motifs
  publication-title: Genome Biol
  doi: 10.1186/gb-2007-8-2-r24
– volume: 51
  start-page: 12
  year: 2019
  ident: 2024041009302593200_btab083-B57
  article-title: A primer on deep learning in genomics
  publication-title: Nat. Genet
  doi: 10.1038/s41588-018-0295-5
– volume: 870
  start-page: 411
  year: 1999
  ident: 2024041009302593200_btab083-B23
  article-title: The linguistics of DNA: words, sentences, grammar, phonetics, and semantics
  publication-title: Ann. N. Y. Acad. Sci. Paper Ed
  doi: 10.1111/j.1749-6632.1999.tb08916.x
– volume: 12
  start-page: 1659
  year: 2017
  ident: 2024041009302593200_btab083-B3
  article-title: Mapping genome-wide transcription-factor binding sites using DAP-seq
  publication-title: Nat. Protoc
  doi: 10.1038/nprot.2017.055
– volume: 15
  start-page: 1206
  year: 2017
  ident: 2024041009302593200_btab083-B49
  article-title: The landscape of isoform switches in human cancers
  publication-title: Mol. Cancer Res
  doi: 10.1158/1541-7786.MCR-16-0459
– year: 2020
  ident: 2024041009302593200_btab083-B9
– year: 2018
  ident: 2024041009302593200_btab083-B13
– volume: 24
  start-page: 167
  year: 2008
  ident: 2024041009302593200_btab083-B12
  article-title: The functional consequences of alternative promoter use in mammalian genomes
  publication-title: Trends Genet
  doi: 10.1016/j.tig.2008.01.008
– volume: 8
  start-page: 1
  year: 2018
  ident: 2024041009302593200_btab083-B44
  article-title: Recurrent neural network for predicting transcription factor binding sites
  publication-title: Sci. Rep. UK
– volume: 73
  start-page: 3169
  year: 1994
  ident: 2024041009302593200_btab083-B36
  article-title: Linguistic features of noncoding DNA sequences
  publication-title: Phys. Rev. Lett
  doi: 10.1103/PhysRevLett.73.3169
– volume: 26
  start-page: 990
  year: 2016
  ident: 2024041009302593200_btab083-B25
  article-title: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
  publication-title: Genome Res
  doi: 10.1101/gr.200535.115
– volume: 113
  start-page: 445
  year: 2003
  ident: 2024041009302593200_btab083-B10
  article-title: The multiple sulfatase deficiency gene encodes an essential and limiting factor for the activity of sulfatases
  publication-title: Cell
  doi: 10.1016/S0092-8674(03)00348-9
– volume: 10
  start-page: 1213
  year: 2013
  ident: 2024041009302593200_btab083-B6
  article-title: Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.2688
– volume: 521
  start-page: 436
  year: 2015
  ident: 2024041009302593200_btab083-B29
  article-title: Deep learning
  publication-title: Nature
  doi: 10.1038/nature14539
– year: 2019
  ident: 2024041009302593200_btab083-B35
– volume: 420
  start-page: 211
  year: 2002
  ident: 2024041009302593200_btab083-B43
  article-title: The language of genes
  publication-title: Nature
  doi: 10.1038/nature01255
– start-page: pp. 5754
  year: 2019
  ident: 2024041009302593200_btab083-B53
– volume: 11
  start-page: 841
  year: 2020
  ident: 2024041009302593200_btab083-B55
  article-title: DeepSite: bidirectional LSTM and CNN models for predicting DNA-protein binding
  publication-title: Int. J. Mach. Learn. Cyb
  doi: 10.1007/s13042-019-00990-x
– volume: 32
  start-page: 2729
  year: 2016
  ident: 2024041009302593200_btab083-B33
  article-title: Predicting regulatory variants with composite statistic
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw288
– volume: 12
  start-page: 931
  year: 2015
  ident: 2024041009302593200_btab083-B56
  article-title: Predicting effects of noncoding variants with deep learning-based sequence model
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.3547
– volume: 36
  start-page: 1234
  year: 2020
  ident: 2024041009302593200_btab083-B30
  article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz682
– volume: 33
  start-page: 831
  year: 2015
  ident: 2024041009302593200_btab083-B1
  article-title: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.3300
– volume: 7
  start-page: S10
  year: 2006
  ident: 2024041009302593200_btab083-B46
  article-title: Automatic annotation of eukaryotic genes, pseudogenes and promoters
  publication-title: Genome Biol
  doi: 10.1186/gb-2006-7-s1-s10
– volume: 14
  start-page: 802
  year: 2008
  ident: 2024041009302593200_btab083-B51
  article-title: Splicing regulation: from a parts list of regulatory elements to an integrated splicing code
  publication-title: RNA
  doi: 10.1261/rna.876308
– volume: 44
  start-page: e107
  year: 2016
  ident: 2024041009302593200_btab083-B41
  article-title: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkw226
– volume: 29
  start-page: 308
  year: 2001
  ident: 2024041009302593200_btab083-B45
  article-title: dbSNP: the NCBI database of genetic variation
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/29.1.308
– volume: 29
  start-page: 412
  year: 2003
  ident: 2024041009302593200_btab083-B11
  article-title: Application of FirstEF to find promoters and first exons in the human genome
  publication-title: Curr.Protoc.Bioinf
– volume: 46
  start-page: e72
  year: 2018
  ident: 2024041009302593200_btab083-B26
  article-title: A novel method for improved accuracy of transcription factor binding site prediction
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky237
– volume: 44
  start-page: e71
  year: 2006
  ident: 2024041009302593200_btab083-B22
  article-title: MYO7A mutation screening in Usher syndrome type I patients from diverse origins
  publication-title: J. Med. Genet
  doi: 10.1136/jmg.2006.045377
– volume: 4
  start-page: 8
  year: 2012
  ident: 2024041009302593200_btab083-B17
  article-title: The context of gene expression regulation
  publication-title: F1000 Biol. Rep
  doi: 10.3410/B4-8
– volume: 9
  start-page: 1735
  year: 1997
  ident: 2024041009302593200_btab083-B21
  article-title: Long short-term memory
  publication-title: Neural Comput
  doi: 10.1162/neco.1997.9.8.1735
– volume: 20
  start-page: 652
  year: 2019
  ident: 2024041009302593200_btab083-B50
  article-title: SpliceFinder: ab initio prediction of splice sites using convolutional neural network
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-019-3306-3
– volume: 12
  start-page: 2561
  year: 1984
  ident: 2024041009302593200_btab083-B5
  article-title: Genome structure described by formal languages
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/12.5.2561
– volume: 49
  start-page: 737
  year: 1987
  ident: 2024041009302593200_btab083-B20
  article-title: Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors
  publication-title: Bull. Math. Biol
  doi: 10.1016/S0092-8240(87)90018-8
– volume: 10
  start-page: 286
  year: 2019
  ident: 2024041009302593200_btab083-B40
  article-title: DeePromoter: robust promoter predictor using deep learning
  publication-title: Front. Genet
  doi: 10.3389/fgene.2019.00286
– volume: 16
  start-page: 5631
  year: 2018
  ident: 2024041009302593200_btab083-B34
  article-title: Interaction of polymorphisms in xerodermapigmentosum group C with cigarette smoking and pancreatic cancer risk
  publication-title: OncolLett
SSID ssj0005056
Score 2.72291
Snippet Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex...
Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 2112
SubjectTerms Original Papers
Title DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
URI https://www.ncbi.nlm.nih.gov/pubmed/33538820
https://www.proquest.com/docview/2486463907
https://pubmed.ncbi.nlm.nih.gov/PMC11025658
Volume 37
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFA5jIPgi3p03IvgklDVtmja-bboxBCeMDvZWkjbBgetklwf_vSdpO1dB1JdCaRLanKTfl-Sc7yB0mzKpBSHaoTKEC9Ew51jgO4EKJWUAKdTmIXsessGYPk2CSQORKhbm-xE-99tyOi9FRI1wcVuuhATeAH9dQGKjlh-_TL6cOlybr9XokDk0cv0qJvjHZmpwVAtx22Ka3x0mtxCov4_2SuqIO4WtD1BD5Ydop0gm-XGEZo_DTrc3iu-x8eywqR9UhrvTArTsjh_u5SaEfYFH1v-1DDvKl9gEmeC44rDACLHNkIPhDkOzTrWriac5NqquM3WMxv1e_DBwymQKTgprzZUTUe6zSBEgiFqKSLpceJEA-qSA9MEqNQuFx7UMUq2Y5J6gBLpHEwmIn3LJpH-Cmvk8V2cIw5JNcqJFwJn5z4ZgbZZRQoRLtMoUb6Gg6tMkLZXGzVe_JcWJt5_UbZGUtmih9qbee6G18WuNOzDZnwvfVJZNYA6ZgxGRq_l6mXg0YhSomhu20Glh6U2bvg-QADSphaLaGNgUMPrc9Sf59NXqdAOzAkIZROf_ecsLtOsZtxnrlXKJmqvFWl0B71nJazvUPwFxGQhq
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DNABERT%3A+pre-trained+Bidirectional+Encoder+Representations+from+Transformers+model+for+DNA-language+in+genome&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Ji%2C+Yanrong&rft.au=Zhou%2C+Zhihan&rft.au=Liu%2C+Han&rft.au=Davuluri%2C+Ramana+V&rft.date=2021-08-09&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=37&rft.issue=15&rft.spage=2112&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab083&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon