DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics (Oxford, England) Vol. 37; no. 15; pp. 2112 - 2120
Main Authors	Ji, Yanrong, Zhou, Zhihan, Liu, Han, Davuluri, Ramana V
Format	Journal Article
Language	English
Published	England Oxford University Press 09.08.2021
Subjects	Original Papers
Online Access	Get full text

Cover

Loading…

Abstract	Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList	Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online. Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios.MOTIVATIONDeciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios.To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks.RESULTSTo address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks.The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT).AVAILABILITY AND IMPLEMENTATIONThe source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT).Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary data are available at Bioinformatics online.
Author	Ji, Yanrong Liu, Han Davuluri, Ramana V Zhou, Zhihan
Author_xml	– sequence: 1 givenname: Yanrong surname: Ji fullname: Ji, Yanrong – sequence: 2 givenname: Zhihan surname: Zhou fullname: Zhou, Zhihan – sequence: 3 givenname: Han surname: Liu fullname: Liu, Han email: hanliu@northwestern.edu – sequence: 4 givenname: Ramana V orcidid: 0000-0002-7053-1064 surname: Davuluri fullname: Davuluri, Ramana V email: ramana.davuluri@stonybrookmedicine.edu
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/33538820$$D View this record in MEDLINE/PubMed
BookMark	eNqNUUtP3DAQthBVefUvIB97Cdhx4nWqSgjo9iEhkND2bI2dyeIqsRc7QeLf16vdrgoXOHms-R4z8x2RfR88EnLK2RlnjTg3LjjfhTjA6Gw6NyMYpsQeOeRCzopKcb6_q5k4IEcp_WGM1ayWH8mBELVQqmSHZPh2e3k1v198oauIxRjBeWzplWtdRDu64KGnc29Di5HeY8Yk9COsG4l2MQx0EcGn9SAYEx0yrqf5R7Ns0YNfTrBE6jxdog8DnpAPHfQJP23fY_L7-3xx_bO4ufvx6_ryprB1KcZCVY2QCrlirDOgDGugVFDVEhmXSszaGZRNZ2rboTRNCRXPy3fclBWzjZFGHJOLje5qMgO2Ns8coder6AaIzzqA0y873j3oZXjSnLOylrXKCp-3CjE8TphGPbhksc87YZiSLislKykaNsvQ0__Ndi7_jpwBcgOwMaQUsdtBONPrNPXLNPU2zUz8-opo3eb466D6t-l8Qw_T6r2WfwF-CcGs
CitedBy_id	crossref_primary_10_1093_nar_gkac824 crossref_primary_10_1111_tpj_70047 crossref_primary_10_1093_bib_bbae641 crossref_primary_10_1126_sciadv_adk4670 crossref_primary_10_1038_s41587_024_02353_6 crossref_primary_10_1016_j_jbi_2022_104231 crossref_primary_10_1093_bioinformatics_btad468 crossref_primary_10_51300_jidm_2022_58 crossref_primary_10_2174_0115665232268074231026111634 crossref_primary_10_1093_bib_bbad307 crossref_primary_10_3390_biology13100755 crossref_primary_10_1016_j_neunet_2023_12_002 crossref_primary_10_1371_journal_pcbi_1010779 crossref_primary_10_1016_j_csbj_2023_11_025 crossref_primary_10_1186_s12911_024_02600_5 crossref_primary_10_3390_genes13122323 crossref_primary_10_1038_s41587_024_02414_w crossref_primary_10_1186_s12859_023_05577_6 crossref_primary_10_1186_s13040_024_00410_z crossref_primary_10_1038_s41598_024_72512_x crossref_primary_10_1126_science_ado9336 crossref_primary_10_3390_genes16030284 crossref_primary_10_1101_gr_279142_124 crossref_primary_10_1186_s13059_023_02955_4 crossref_primary_10_1093_bioinformatics_btae326 crossref_primary_10_1016_j_tig_2024_11_013 crossref_primary_10_1186_s12859_024_05869_5 crossref_primary_10_1016_j_jgg_2024_12_016 crossref_primary_10_1016_j_heliyon_2024_e31626 crossref_primary_10_1038_s42256_024_00872_0 crossref_primary_10_1016_j_crmeth_2024_100707 crossref_primary_10_1038_s41592_024_02523_z crossref_primary_10_1093_bib_bbac204 crossref_primary_10_1021_acsomega_3c05913 crossref_primary_10_1038_s41467_025_55920_z crossref_primary_10_3389_fgene_2022_1067562 crossref_primary_10_1073_pnas_2311219120 crossref_primary_10_2196_49724 crossref_primary_10_1093_jamia_ocaf029 crossref_primary_10_3390_v16111673 crossref_primary_10_1016_j_crmeth_2022_100384 crossref_primary_10_3390_ijms242115858 crossref_primary_10_1088_2632_2153_acb488 crossref_primary_10_1093_nar_gkae429 crossref_primary_10_1021_acs_jcim_4c01097 crossref_primary_10_1038_s42256_024_00836_4 crossref_primary_10_3389_fgene_2022_885627 crossref_primary_10_1016_j_neunet_2024_107040 crossref_primary_10_1093_nar_gkad578 crossref_primary_10_2196_59505 crossref_primary_10_3389_fbrio_2024_1326958 crossref_primary_10_1016_j_csbr_2024_100003 crossref_primary_10_1093_bib_bbad210 crossref_primary_10_1093_bioinformatics_btad248 crossref_primary_10_1109_JBHI_2024_3349584 crossref_primary_10_1016_j_cell_2023_02_018 crossref_primary_10_1016_j_trd_2025_104644 crossref_primary_10_1093_bib_bbad208 crossref_primary_10_3389_fgene_2022_1081842 crossref_primary_10_3389_fmicb_2024_1516667 crossref_primary_10_3390_genes15081090 crossref_primary_10_3389_fimmu_2024_1357217 crossref_primary_10_1007_s10142_024_01417_9 crossref_primary_10_1093_bioinformatics_btae461 crossref_primary_10_1016_j_omtn_2024_102192 crossref_primary_10_1109_TCBB_2024_3459870 crossref_primary_10_1002_1878_0261_13745 crossref_primary_10_1109_TCBB_2023_3323295 crossref_primary_10_1186_s12864_024_10885_z crossref_primary_10_1016_j_csbj_2025_03_024 crossref_primary_10_1093_bib_bbae651 crossref_primary_10_3390_a15080274 crossref_primary_10_1063_5_0249920 crossref_primary_10_3390_foods11223742 crossref_primary_10_1002_mef2_96 crossref_primary_10_1093_bib_bbac598 crossref_primary_10_2298_CSIS240314049L crossref_primary_10_3390_biomedinformatics4020085 crossref_primary_10_1093_bib_bbad442 crossref_primary_10_1016_j_vaccine_2023_07_024 crossref_primary_10_1016_j_heliyon_2024_e28443 crossref_primary_10_1093_bib_bbad438 crossref_primary_10_3389_fnins_2022_846638 crossref_primary_10_1093_bioadv_vbac023 crossref_primary_10_1016_j_compbiomed_2023_107077 crossref_primary_10_3389_frai_2024_1424012 crossref_primary_10_1016_j_omtn_2024_102255 crossref_primary_10_1016_j_ymeth_2024_12_006 crossref_primary_10_1038_s41467_024_46947_9 crossref_primary_10_1093_bib_bbad193 crossref_primary_10_1016_j_cels_2023_05_007 crossref_primary_10_1016_j_compbiomed_2024_108466 crossref_primary_10_1080_15476286_2024_2315384 crossref_primary_10_3390_genes15010034 crossref_primary_10_48084_etasr_6295 crossref_primary_10_1038_s41467_025_56330_x crossref_primary_10_1093_bib_bbae163 crossref_primary_10_3390_app13126996 crossref_primary_10_1186_s12864_023_09802_7 crossref_primary_10_1186_s12863_023_01123_8 crossref_primary_10_1186_s13059_022_02780_1 crossref_primary_10_1186_s40246_023_00513_4 crossref_primary_10_1016_j_omtn_2024_102370 crossref_primary_10_1038_s42256_025_01007_9 crossref_primary_10_1021_acs_jcim_3c02070 crossref_primary_10_1186_s12859_023_05303_2 crossref_primary_10_2197_ipsjtbio_16_20 crossref_primary_10_3390_biom14070767 crossref_primary_10_1016_j_isci_2024_109334 crossref_primary_10_1016_j_scitotenv_2024_172466 crossref_primary_10_1017_eds_2023_37 crossref_primary_10_1016_j_compbiomed_2024_108230 crossref_primary_10_1016_j_compbiomed_2024_109440 crossref_primary_10_1007_s12539_024_00661_8 crossref_primary_10_1093_bib_bbae157 crossref_primary_10_1093_bioinformatics_btac509 crossref_primary_10_1109_JBHI_2024_3354121 crossref_primary_10_1186_s12915_024_01923_z crossref_primary_10_1016_j_heliyon_2024_e39140 crossref_primary_10_1016_j_drudis_2024_103990 crossref_primary_10_1016_j_immuno_2024_100040 crossref_primary_10_1186_s13059_023_02934_9 crossref_primary_10_1186_s13059_024_03379_4 crossref_primary_10_1016_j_jisa_2024_103953 crossref_primary_10_1109_RBME_2024_3496744 crossref_primary_10_1016_j_csbj_2025_03_007 crossref_primary_10_1016_j_compeleceng_2024_109786 crossref_primary_10_1038_s41592_021_01252_x crossref_primary_10_1016_j_eswa_2023_120439 crossref_primary_10_1186_s12859_023_05469_9 crossref_primary_10_15302_J_QB_022_0315 crossref_primary_10_3390_cells12081191 crossref_primary_10_1016_j_rineng_2024_103476 crossref_primary_10_1038_s43588_023_00544_w crossref_primary_10_1016_j_jai_2025_03_004 crossref_primary_10_1093_bib_bbae702 crossref_primary_10_1016_j_isci_2024_111658 crossref_primary_10_1093_nar_gkad055 crossref_primary_10_26508_lsa_202301962 crossref_primary_10_3390_biomedinformatics4030101 crossref_primary_10_3390_genes15040404 crossref_primary_10_1016_j_xcrp_2023_101600 crossref_primary_10_3390_ijms25052869 crossref_primary_10_1093_bib_bbad093 crossref_primary_10_1038_s44222_025_00280_y crossref_primary_10_1016_j_future_2024_107601 crossref_primary_10_1093_bioinformatics_btae013 crossref_primary_10_3390_genes15121593 crossref_primary_10_14778_3611479_3611537 crossref_primary_10_1093_nargab_lqad082 crossref_primary_10_1093_database_baac036 crossref_primary_10_1109_TCBB_2022_3204661 crossref_primary_10_1093_nar_gkae1310 crossref_primary_10_1186_s12864_021_08246_1 crossref_primary_10_1186_s12859_023_05573_w crossref_primary_10_1016_j_cell_2024_11_015 crossref_primary_10_1038_s41598_024_77172_5 crossref_primary_10_3389_frai_2023_1128153 crossref_primary_10_1109_JBHI_2023_3288768 crossref_primary_10_1016_j_compbiomed_2024_108376 crossref_primary_10_1093_bioinformatics_btad617 crossref_primary_10_1093_nar_gkae912 crossref_primary_10_1007_s13721_024_00463_4 crossref_primary_10_1093_bfgp_elae009 crossref_primary_10_3389_fsysb_2024_1402664 crossref_primary_10_1109_TCBB_2022_3165592 crossref_primary_10_1371_journal_pcbi_1011162 crossref_primary_10_1016_j_copbio_2023_102941 crossref_primary_10_1093_bib_bbae170 crossref_primary_10_1038_s44222_024_00245_7 crossref_primary_10_1038_s41576_022_00532_2 crossref_primary_10_1093_bioinformatics_btae031 crossref_primary_10_1016_j_compbiomed_2022_105993 crossref_primary_10_3389_frnar_2024_1473293 crossref_primary_10_1038_s41598_024_84105_9 crossref_primary_10_1016_j_compbiolchem_2024_108129 crossref_primary_10_1038_s41592_024_02359_7 crossref_primary_10_1016_j_ymeth_2024_01_011 crossref_primary_10_1186_s12859_023_05352_7 crossref_primary_10_7717_peerj_16600 crossref_primary_10_1002_2211_5463_70003 crossref_primary_10_1186_s13059_024_03320_9 crossref_primary_10_3390_ijms26041723 crossref_primary_10_14348_molcells_2023_2157 crossref_primary_10_1093_bioinformatics_btaf004 crossref_primary_10_1049_enb2_12025 crossref_primary_10_1177_14727978251321951 crossref_primary_10_1016_j_cbpa_2021_04_008 crossref_primary_10_1016_j_csbj_2021_05_039 crossref_primary_10_3390_genes13111952 crossref_primary_10_1186_s13073_023_01238_8 crossref_primary_10_1186_s12859_022_04985_4 crossref_primary_10_3390_s24113553 crossref_primary_10_1016_j_gene_2024_148330 crossref_primary_10_3389_fbinf_2022_910531 crossref_primary_10_1016_j_molp_2024_12_006 crossref_primary_10_1093_bioadv_vbad043 crossref_primary_10_3390_ijms252312942 crossref_primary_10_1186_s13059_024_03449_7 crossref_primary_10_1016_j_compbiolchem_2023_107905 crossref_primary_10_1016_j_isci_2024_109257 crossref_primary_10_1186_s12859_022_05000_6 crossref_primary_10_3389_fnagi_2022_1027224 crossref_primary_10_1093_bib_bbae599 crossref_primary_10_1089_apb_2023_0020 crossref_primary_10_1093_bioadv_vbae016 crossref_primary_10_1093_bioinformatics_btae046 crossref_primary_10_1093_bib_bbad147 crossref_primary_10_1038_s41422_024_01034_y crossref_primary_10_1016_j_jpha_2025_101255 crossref_primary_10_52601_bpr_2024_240006 crossref_primary_10_1093_bioinformatics_btaf018 crossref_primary_10_1371_journal_pone_0301791 crossref_primary_10_1038_s41467_024_53759_4 crossref_primary_10_1093_nsr_nwae355 crossref_primary_10_1016_j_xcrm_2024_101608 crossref_primary_10_1093_gigascience_giad054 crossref_primary_10_1007_s00439_024_02722_w crossref_primary_10_1016_j_isci_2023_108592 crossref_primary_10_3389_fgene_2024_1444459 crossref_primary_10_1093_nargab_lqae129 crossref_primary_10_48130_gcomm_0025_0003 crossref_primary_10_1101_gad_351800_124 crossref_primary_10_3390_ijms26052281 crossref_primary_10_1002_advs_202407013 crossref_primary_10_1109_TCBB_2023_3339597 crossref_primary_10_1093_nar_gkad1031 crossref_primary_10_1038_s41467_022_34152_5 crossref_primary_10_1038_s12276_024_01243_w crossref_primary_10_1038_s41588_025_02121_5 crossref_primary_10_1038_s41576_024_00774_2 crossref_primary_10_1111_eci_14183 crossref_primary_10_1186_s12967_024_05567_z crossref_primary_10_2174_0115748936283134240109054157 crossref_primary_10_1016_j_heliyon_2024_e41488 crossref_primary_10_1093_bib_bbad040 crossref_primary_10_7717_peerj_13666 crossref_primary_10_3389_fmicb_2023_1331233 crossref_primary_10_3389_fgene_2023_1164593 crossref_primary_10_1093_bib_bbae138 crossref_primary_10_1186_s12859_022_04647_5 crossref_primary_10_1093_bioinformatics_btae188 crossref_primary_10_1093_nargab_lqad021 crossref_primary_10_1146_annurev_genom_021623_024727 crossref_primary_10_1016_j_ab_2024_115492 crossref_primary_10_1016_j_compbiolchem_2024_108040 crossref_primary_10_3390_ijms252010928 crossref_primary_10_1016_j_biosystems_2023_105095 crossref_primary_10_1093_bioinformatics_btaf041 crossref_primary_10_1371_journal_pcbi_1010028 crossref_primary_10_1016_j_compbiomed_2023_107238 crossref_primary_10_1093_bioinformatics_btae196 crossref_primary_10_1016_j_biotechadv_2024_108399 crossref_primary_10_3934_mbe_2024264 crossref_primary_10_3389_fgene_2024_1494474 crossref_primary_10_3390_sym15030731 crossref_primary_10_1101_gr_278870_123 crossref_primary_10_1016_j_websem_2024_100845 crossref_primary_10_1093_bib_bbae560 crossref_primary_10_1186_s12863_024_01293_z crossref_primary_10_1016_j_compbiomed_2024_108189 crossref_primary_10_2339_politeknik_1509329 crossref_primary_10_1093_bib_bbae324 crossref_primary_10_1111_cobi_14411 crossref_primary_10_1126_science_adt3007 crossref_primary_10_1093_nar_gkad436 crossref_primary_10_1021_acs_jcim_4c01118 crossref_primary_10_1038_s42003_023_05310_2 crossref_primary_10_1093_bib_bbaf092 crossref_primary_10_1093_nar_gkac1247 crossref_primary_10_1093_bioinformatics_btad541 crossref_primary_10_1021_acsomega_3c05571 crossref_primary_10_1016_j_isci_2025_112081 crossref_primary_10_1093_bioinformatics_btaf051 crossref_primary_10_1371_journal_pcbi_1012755 crossref_primary_10_1016_j_neunet_2024_106978 crossref_primary_10_1109_ACCESS_2024_3367801 crossref_primary_10_1128_msystems_01258_24 crossref_primary_10_3390_life11111135 crossref_primary_10_3390_electronics13214322 crossref_primary_10_1016_j_tplants_2024_04_013 crossref_primary_10_1093_nar_gkae099 crossref_primary_10_1016_j_ijbiomac_2024_130659 crossref_primary_10_3390_ijms252111744 crossref_primary_10_1016_j_yamp_2024_08_001 crossref_primary_10_1093_bioinformatics_btae529 crossref_primary_10_1007_s12539_024_00665_4 crossref_primary_10_1093_bioinformatics_btae640 crossref_primary_10_1093_bib_bbad223 crossref_primary_10_1371_journal_pcbi_1012744 crossref_primary_10_1109_TCBB_2023_3237769 crossref_primary_10_1016_j_bios_2025_117399 crossref_primary_10_1016_j_ijbiomac_2025_140630 crossref_primary_10_1093_bib_bbae548 crossref_primary_10_3389_frai_2022_1040295 crossref_primary_10_7717_peerj_cs_1340 crossref_primary_10_1093_nar_gkae783 crossref_primary_10_1016_j_xgen_2025_100762 crossref_primary_10_1002_qub2_69 crossref_primary_10_1093_bib_bbad250 crossref_primary_10_3390_computers14030085 crossref_primary_10_46989_001c_124131 crossref_primary_10_1007_s11432_024_4171_9 crossref_primary_10_1016_j_ymeth_2025_01_014 crossref_primary_10_1038_s42003_023_04866_3 crossref_primary_10_1093_nar_gkac326 crossref_primary_10_3389_fgene_2023_1254827 crossref_primary_10_1002_mef2_43 crossref_primary_10_1093_comjnl_bxae018 crossref_primary_10_1186_s12859_024_05891_7 crossref_primary_10_1016_j_jmb_2024_168856 crossref_primary_10_1038_s41467_023_42547_1 crossref_primary_10_3390_fermentation10020093 crossref_primary_10_1109_TCBB_2023_3283985 crossref_primary_10_1088_2632_2153_acb2b2 crossref_primary_10_1016_j_csbj_2024_09_031 crossref_primary_10_3390_genes15050629 crossref_primary_10_1128_spectrum_01350_23 crossref_primary_10_1111_tpj_17190 crossref_primary_10_2174_0113894501330963240905083020 crossref_primary_10_1093_genetics_iyae136 crossref_primary_10_1186_s13040_024_00414_9 crossref_primary_10_1016_j_compbiomed_2023_107260 crossref_primary_10_1016_j_patter_2024_101150 crossref_primary_10_1371_journal_pcbi_1010238 crossref_primary_10_3390_ijms232012385 crossref_primary_10_1038_s42003_024_06161_1 crossref_primary_10_1093_bib_bbae210 crossref_primary_10_3390_ijms25094990 crossref_primary_10_1007_s12539_022_00537_9 crossref_primary_10_1093_bioadv_vbae117 crossref_primary_10_1111_1755_0998_14006 crossref_primary_10_1186_s12915_024_02055_0 crossref_primary_10_1371_journal_pcbi_1012525 crossref_primary_10_1093_bioinformatics_btaf085 crossref_primary_10_1093_nsr_nwaf028 crossref_primary_10_1016_j_omtn_2023_06_019 crossref_primary_10_1145_3715318 crossref_primary_10_1016_j_jechem_2024_11_011 crossref_primary_10_1016_j_jneumeth_2025_110363 crossref_primary_10_1093_bib_bbac027 crossref_primary_10_1186_s12859_024_05849_9 crossref_primary_10_1186_s13059_024_03221_x crossref_primary_10_71423_aimed_20250102
Cites_doi	10.1038/s41598-019-56894-x 10.1073/pnas.242597299 10.1038/nature11247 10.1093/nar/gky1120 10.2196/14830 10.1093/nar/gkt1113 10.1186/gb-2012-13-8-418 10.1038/s41576-019-0173-8 10.1109/TPAMI.2013.50 10.1073/pnas.53.5.1161 10.1038/nrg3813 10.1093/bioinformatics/bty1068 10.1093/nar/gkr028 10.1093/bioinformatics/btu273 10.1038/nature01262 10.1093/nar/gks1233 10.1186/gb-2007-8-2-r24 10.1038/s41588-018-0295-5 10.1111/j.1749-6632.1999.tb08916.x 10.1038/nprot.2017.055 10.1158/1541-7786.MCR-16-0459 10.1016/j.tig.2008.01.008 10.1103/PhysRevLett.73.3169 10.1101/gr.200535.115 10.1016/S0092-8674(03)00348-9 10.1038/nmeth.2688 10.1038/nature14539 10.1038/nature01255 10.1007/s13042-019-00990-x 10.1093/bioinformatics/btw288 10.1038/nmeth.3547 10.1093/bioinformatics/btz682 10.1038/nbt.3300 10.1186/gb-2006-7-s1-s10 10.1261/rna.876308 10.1093/nar/gkw226 10.1093/nar/29.1.308 10.1093/nar/gky237 10.1136/jmg.2006.045377 10.3410/B4-8 10.1162/neco.1997.9.8.1735 10.1186/s12859-019-3306-3 10.1093/nar/12.5.2561 10.1016/S0092-8240(87)90018-8 10.3389/fgene.2019.00286
ContentType	Journal Article
Copyright	The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Copyright_xml	– notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021 – notice: The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
DBID	AAYXX CITATION NPM 7X8 5PM
DOI	10.1093/bioinformatics/btab083
DatabaseName	CrossRef PubMed MEDLINE - Academic PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef PubMed MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1367-4811
EndPage	2120
ExternalDocumentID	PMC11025658 33538820 10_1093_bioinformatics_btab083 10.1093/bioinformatics/btab083
Genre	Journal Article
GrantInformation_xml	– fundername: NLM NIH HHS grantid: R01 LM011297 – fundername: NLM NIH HHS grantid: R01 LM013722 – fundername: ; – fundername: ; grantid: R01LM011297
GroupedDBID	--- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABGNP ABIXL ABNGD ABNKS ABPQP ABPTD ABQLI ABQTQ ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX CITATION NPM 7X8 5PM
ID	FETCH-LOGICAL-c523t-849368e1800fba8b09a28a456e016837d7a29fb5cfe6b92a41b08f1b240c9b6b3
IEDL.DBID	TOX
ISSN	1367-4803 1367-4811
IngestDate	Thu Aug 21 18:34:09 EDT 2025 Fri Jul 11 02:12:13 EDT 2025 Mon Jul 21 05:56:30 EDT 2025 Thu Apr 24 23:05:30 EDT 2025 Tue Jul 01 02:33:54 EDT 2025 Wed Apr 02 07:06:57 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	15
Language	English
License	This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c523t-849368e1800fba8b09a28a456e016837d7a29fb5cfe6b92a41b08f1b240c9b6b3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
ORCID	0000-0002-7053-1064
OpenAccessLink	https://www.ncbi.nlm.nih.gov/pmc/articles/11025658
PMID	33538820
PQID	2486463907
PQPubID	23479
PageCount	9
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_11025658 proquest_miscellaneous_2486463907 pubmed_primary_33538820 crossref_primary_10_1093_bioinformatics_btab083 crossref_citationtrail_10_1093_bioinformatics_btab083 oup_primary_10_1093_bioinformatics_btab083
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-Aug-09
PublicationDateYYYYMMDD	2021-08-09
PublicationDate_xml	– month: 08 year: 2021 text: 2021-Aug-09 day: 09
PublicationDecade	2020
PublicationPlace	England
PublicationPlace_xml	– name: England
PublicationTitle	Bioinformatics (Oxford, England)
PublicationTitleAlternate	Bioinformatics
PublicationYear	2021
Publisher	Oxford University Press
Publisher_xml	– name: Oxford University Press
References	Umarov (2024041009302593200_btab083-B47) 2019; 35 Bengio (2024041009302593200_btab083-B4) 2013; 35 Bartlett (2024041009302593200_btab083-B3) 2017; 12 Oubounyt (2024041009302593200_btab083-B40) 2019; 10 Zhou (2024041009302593200_btab083-B56) 2015; 12 Koeppel (2024041009302593200_btab083-B27) 2011; 39 Cho (2024041009302593200_btab083-B8) 2014 Quang (2024041009302593200_btab083-B41) 2016; 44 Devlin (2024041009302593200_btab083-B13) 2018 Min (2024041009302593200_btab083-B37) 2019 Wang (2024041009302593200_btab083-B51) 2008; 14 Searls (2024041009302593200_btab083-B42) 1992; 80 Jaijo (2024041009302593200_btab083-B22) 2006; 44 Head (2024041009302593200_btab083-B20) 1987; 49 Brendel (2024041009302593200_btab083-B5) 1984; 12 Buenrostro (2024041009302593200_btab083-B6) 2013; 10 Li (2024041009302593200_btab083-B32) 2019; 7 Gerstberger (2024041009302593200_btab083-B16) 2014; 15 Dunham (2024041009302593200_btab083-B15) 2012; 489 Landrum (2024041009302593200_btab083-B28) 2014; 42 Nirenberg (2024041009302593200_btab083-B39) 1965; 53 Khamis (2024041009302593200_btab083-B26) 2018; 46 Waterston (2024041009302593200_btab083-B52) 2002; 420 Liu (2024041009302593200_btab083-B35) 2019 Shen (2024041009302593200_btab083-B44) 2018; 8 Yoon (2024041009302593200_btab083-B54) 2002; 99 Gupta (2024041009302593200_btab083-B18) 2007; 8 Davuluri (2024041009302593200_btab083-B11) 2003; 29 Dreos (2024041009302593200_btab083-B14) 2013; 41 Mantegna (2024041009302593200_btab083-B36) 1994; 73 Leslie (2024041009302593200_btab083-B31) 2014; 30 Ji (2024041009302593200_btab083-B23) 1999; 870 Searls (2024041009302593200_btab083-B43) 2002; 420 Lee (2024041009302593200_btab083-B30) 2020; 36 Clauwaert (2024041009302593200_btab083-B9) 2020 Liang (2024041009302593200_btab083-B34) 2018; 16 Cosma (2024041009302593200_btab083-B10) 2003; 113 Li (2024041009302593200_btab083-B33) 2016; 32 Andersson (2024041009302593200_btab083-B2) 2020; 21 Vitting-Seerup (2024041009302593200_btab083-B49) 2017; 15 Davuluri (2024041009302593200_btab083-B12) 2008; 24 Buniello (2024041009302593200_btab083-B7) 2019; 47 Ji (2024041009302593200_btab083-B24) 2020; 10 Wang (2024041009302593200_btab083-B50) 2019; 20 Hochreiter (2024041009302593200_btab083-B21) 1997; 9 Kelley (2024041009302593200_btab083-B25) 2016; 26 Zhang (2024041009302593200_btab083-B55) 2020; 11 Zou (2024041009302593200_btab083-B57) 2019; 51 Alipanahi (2024041009302593200_btab083-B1) 2015; 33 Vaswani (2024041009302593200_btab083-B48) 2017 LeCun (2024041009302593200_btab083-B29) 2015; 521 Sherry (2024041009302593200_btab083-B45) 2001; 29 Yang (2024041009302593200_btab083-B53) 2019 Mouse (2024041009302593200_btab083-B38) 2012; 13 Gibcus (2024041009302593200_btab083-B17) 2012; 4 Hassanzadeh (2024041009302593200_btab083-B19) 2016 Solovyev (2024041009302593200_btab083-B46) 2006; 7
References_xml	– volume: 10 start-page: 134 year: 2020 ident: 2024041009302593200_btab083-B24 article-title: In silico analysis of alternative splicing on drug–target gene interactions publication-title: Sci. Rep doi: 10.1038/s41598-019-56894-x – year: 2019 ident: 2024041009302593200_btab083-B37 – volume: 99 start-page: 15632 year: 2002 ident: 2024041009302593200_btab083-B54 article-title: Gene expression profiling of isogenic cells with different TP53 gene dosage reveals numerous genes that are affected by TP53 dosage and identifies CSPG2 as a direct target of p53 publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.242597299 – volume: 489 start-page: 57 year: 2012 ident: 2024041009302593200_btab083-B15 article-title: An integrated encyclopedia of DNA elements in the human genome publication-title: Nature doi: 10.1038/nature11247 – volume: 47 start-page: D1005 year: 2019 ident: 2024041009302593200_btab083-B7 article-title: The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 publication-title: Nucleic Acids Res doi: 10.1093/nar/gky1120 – volume: 7 start-page: e14830 year: 2019 ident: 2024041009302593200_btab083-B32 article-title: Fine-tuning bidirectional encoder representations from transformers (BERT)-based models on large-scale electronic health record notes: an empirical study publication-title: JMIR Med. Inform doi: 10.2196/14830 – volume: 42 start-page: D980 year: 2014 ident: 2024041009302593200_btab083-B28 article-title: ClinVar: public archive of relationships among sequence variation and human phenotype publication-title: Nucleic Acids Res doi: 10.1093/nar/gkt1113 – year: 2014 ident: 2024041009302593200_btab083-B8 – start-page: 178 year: 2016 ident: 2024041009302593200_btab083-B19 – volume: 13 start-page: 418 year: 2012 ident: 2024041009302593200_btab083-B38 article-title: An encyclopedia of mouse DNA elements (Mouse ENCODE) publication-title: Genome Biol doi: 10.1186/gb-2012-13-8-418 – start-page: 6000 year: 2017 ident: 2024041009302593200_btab083-B48 – volume: 21 start-page: 71 year: 2020 ident: 2024041009302593200_btab083-B2 article-title: Determinants of enhancer and promoter activities of regulatory elements publication-title: Nat. Rev. Genet doi: 10.1038/s41576-019-0173-8 – volume: 35 start-page: 1798 year: 2013 ident: 2024041009302593200_btab083-B4 article-title: Representation learning: a review and new perspectives publication-title: IEEE Trans. Pattern Anal doi: 10.1109/TPAMI.2013.50 – volume: 53 start-page: 1161 year: 1965 ident: 2024041009302593200_btab083-B39 article-title: RNA codewords and protein synthesis, VII. On the general nature of the RNA code publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.53.5.1161 – volume: 15 start-page: 829 year: 2014 ident: 2024041009302593200_btab083-B16 article-title: A census of human RNA-binding proteins publication-title: Nat. Rev. Genet doi: 10.1038/nrg3813 – volume: 35 start-page: 2730 year: 2019 ident: 2024041009302593200_btab083-B47 article-title: Promoter analysis and prediction in the human genome using sequence-based deep learning models publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty1068 – volume: 39 start-page: 6069 year: 2011 ident: 2024041009302593200_btab083-B27 article-title: Crosstalk between c-Jun and TAp73alpha/beta contributes to the apoptosis-survival balance publication-title: Nucleic Acids Res doi: 10.1093/nar/gkr028 – volume: 30 start-page: i185 year: 2014 ident: 2024041009302593200_btab083-B31 article-title: GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu273 – volume: 420 start-page: 520 year: 2002 ident: 2024041009302593200_btab083-B52 article-title: Initial sequencing and comparative analysis of the mouse genome publication-title: Nature doi: 10.1038/nature01262 – volume: 41 start-page: D157 year: 2013 ident: 2024041009302593200_btab083-B14 article-title: EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era publication-title: Nucleic Acids Res doi: 10.1093/nar/gks1233 – volume: 80 start-page: 579 year: 1992 ident: 2024041009302593200_btab083-B42 article-title: The linguistics of DNA publication-title: Am. Sci – volume: 8 start-page: R24 year: 2007 ident: 2024041009302593200_btab083-B18 article-title: Quantifying similarity between motifs publication-title: Genome Biol doi: 10.1186/gb-2007-8-2-r24 – volume: 51 start-page: 12 year: 2019 ident: 2024041009302593200_btab083-B57 article-title: A primer on deep learning in genomics publication-title: Nat. Genet doi: 10.1038/s41588-018-0295-5 – volume: 870 start-page: 411 year: 1999 ident: 2024041009302593200_btab083-B23 article-title: The linguistics of DNA: words, sentences, grammar, phonetics, and semantics publication-title: Ann. N. Y. Acad. Sci. Paper Ed doi: 10.1111/j.1749-6632.1999.tb08916.x – volume: 12 start-page: 1659 year: 2017 ident: 2024041009302593200_btab083-B3 article-title: Mapping genome-wide transcription-factor binding sites using DAP-seq publication-title: Nat. Protoc doi: 10.1038/nprot.2017.055 – volume: 15 start-page: 1206 year: 2017 ident: 2024041009302593200_btab083-B49 article-title: The landscape of isoform switches in human cancers publication-title: Mol. Cancer Res doi: 10.1158/1541-7786.MCR-16-0459 – year: 2020 ident: 2024041009302593200_btab083-B9 – year: 2018 ident: 2024041009302593200_btab083-B13 – volume: 24 start-page: 167 year: 2008 ident: 2024041009302593200_btab083-B12 article-title: The functional consequences of alternative promoter use in mammalian genomes publication-title: Trends Genet doi: 10.1016/j.tig.2008.01.008 – volume: 8 start-page: 1 year: 2018 ident: 2024041009302593200_btab083-B44 article-title: Recurrent neural network for predicting transcription factor binding sites publication-title: Sci. Rep. UK – volume: 73 start-page: 3169 year: 1994 ident: 2024041009302593200_btab083-B36 article-title: Linguistic features of noncoding DNA sequences publication-title: Phys. Rev. Lett doi: 10.1103/PhysRevLett.73.3169 – volume: 26 start-page: 990 year: 2016 ident: 2024041009302593200_btab083-B25 article-title: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks publication-title: Genome Res doi: 10.1101/gr.200535.115 – volume: 113 start-page: 445 year: 2003 ident: 2024041009302593200_btab083-B10 article-title: The multiple sulfatase deficiency gene encodes an essential and limiting factor for the activity of sulfatases publication-title: Cell doi: 10.1016/S0092-8674(03)00348-9 – volume: 10 start-page: 1213 year: 2013 ident: 2024041009302593200_btab083-B6 article-title: Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position publication-title: Nat. Methods doi: 10.1038/nmeth.2688 – volume: 521 start-page: 436 year: 2015 ident: 2024041009302593200_btab083-B29 article-title: Deep learning publication-title: Nature doi: 10.1038/nature14539 – year: 2019 ident: 2024041009302593200_btab083-B35 – volume: 420 start-page: 211 year: 2002 ident: 2024041009302593200_btab083-B43 article-title: The language of genes publication-title: Nature doi: 10.1038/nature01255 – start-page: pp. 5754 year: 2019 ident: 2024041009302593200_btab083-B53 – volume: 11 start-page: 841 year: 2020 ident: 2024041009302593200_btab083-B55 article-title: DeepSite: bidirectional LSTM and CNN models for predicting DNA-protein binding publication-title: Int. J. Mach. Learn. Cyb doi: 10.1007/s13042-019-00990-x – volume: 32 start-page: 2729 year: 2016 ident: 2024041009302593200_btab083-B33 article-title: Predicting regulatory variants with composite statistic publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw288 – volume: 12 start-page: 931 year: 2015 ident: 2024041009302593200_btab083-B56 article-title: Predicting effects of noncoding variants with deep learning-based sequence model publication-title: Nat. Methods doi: 10.1038/nmeth.3547 – volume: 36 start-page: 1234 year: 2020 ident: 2024041009302593200_btab083-B30 article-title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz682 – volume: 33 start-page: 831 year: 2015 ident: 2024041009302593200_btab083-B1 article-title: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning publication-title: Nat. Biotechnol doi: 10.1038/nbt.3300 – volume: 7 start-page: S10 year: 2006 ident: 2024041009302593200_btab083-B46 article-title: Automatic annotation of eukaryotic genes, pseudogenes and promoters publication-title: Genome Biol doi: 10.1186/gb-2006-7-s1-s10 – volume: 14 start-page: 802 year: 2008 ident: 2024041009302593200_btab083-B51 article-title: Splicing regulation: from a parts list of regulatory elements to an integrated splicing code publication-title: RNA doi: 10.1261/rna.876308 – volume: 44 start-page: e107 year: 2016 ident: 2024041009302593200_btab083-B41 article-title: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences publication-title: Nucleic Acids Res doi: 10.1093/nar/gkw226 – volume: 29 start-page: 308 year: 2001 ident: 2024041009302593200_btab083-B45 article-title: dbSNP: the NCBI database of genetic variation publication-title: Nucleic Acids Res doi: 10.1093/nar/29.1.308 – volume: 29 start-page: 412 year: 2003 ident: 2024041009302593200_btab083-B11 article-title: Application of FirstEF to find promoters and first exons in the human genome publication-title: Curr.Protoc.Bioinf – volume: 46 start-page: e72 year: 2018 ident: 2024041009302593200_btab083-B26 article-title: A novel method for improved accuracy of transcription factor binding site prediction publication-title: Nucleic Acids Res doi: 10.1093/nar/gky237 – volume: 44 start-page: e71 year: 2006 ident: 2024041009302593200_btab083-B22 article-title: MYO7A mutation screening in Usher syndrome type I patients from diverse origins publication-title: J. Med. Genet doi: 10.1136/jmg.2006.045377 – volume: 4 start-page: 8 year: 2012 ident: 2024041009302593200_btab083-B17 article-title: The context of gene expression regulation publication-title: F1000 Biol. Rep doi: 10.3410/B4-8 – volume: 9 start-page: 1735 year: 1997 ident: 2024041009302593200_btab083-B21 article-title: Long short-term memory publication-title: Neural Comput doi: 10.1162/neco.1997.9.8.1735 – volume: 20 start-page: 652 year: 2019 ident: 2024041009302593200_btab083-B50 article-title: SpliceFinder: ab initio prediction of splice sites using convolutional neural network publication-title: BMC Bioinformatics doi: 10.1186/s12859-019-3306-3 – volume: 12 start-page: 2561 year: 1984 ident: 2024041009302593200_btab083-B5 article-title: Genome structure described by formal languages publication-title: Nucleic Acids Res doi: 10.1093/nar/12.5.2561 – volume: 49 start-page: 737 year: 1987 ident: 2024041009302593200_btab083-B20 article-title: Formal language theory and DNA: an analysis of the generative capacity of specific recombinant behaviors publication-title: Bull. Math. Biol doi: 10.1016/S0092-8240(87)90018-8 – volume: 10 start-page: 286 year: 2019 ident: 2024041009302593200_btab083-B40 article-title: DeePromoter: robust promoter predictor using deep learning publication-title: Front. Genet doi: 10.3389/fgene.2019.00286 – volume: 16 start-page: 5631 year: 2018 ident: 2024041009302593200_btab083-B34 article-title: Interaction of polymorphisms in xerodermapigmentosum group C with cigarette smoking and pancreatic cancer risk publication-title: OncolLett
SSID	ssj0005056
Score	2.72291
Snippet	Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex... Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence...
SourceID	pubmedcentral proquest pubmed crossref oup
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	2112
SubjectTerms	Original Papers
Title	DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
URI	https://www.ncbi.nlm.nih.gov/pubmed/33538820 https://www.proquest.com/docview/2486463907 https://pubmed.ncbi.nlm.nih.gov/PMC11025658
Volume	37
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFA5jIPgi3p03IvgklDVtmja-bboxBCeMDvZWkjbBgetklwf_vSdpO1dB1JdCaRLanKTfl-Sc7yB0mzKpBSHaoTKEC9Ew51jgO4EKJWUAKdTmIXsessGYPk2CSQORKhbm-xE-99tyOi9FRI1wcVuuhATeAH9dQGKjlh-_TL6cOlybr9XokDk0cv0qJvjHZmpwVAtx22Ka3x0mtxCov4_2SuqIO4WtD1BD5Ydop0gm-XGEZo_DTrc3iu-x8eywqR9UhrvTArTsjh_u5SaEfYFH1v-1DDvKl9gEmeC44rDACLHNkIPhDkOzTrWriac5NqquM3WMxv1e_DBwymQKTgprzZUTUe6zSBEgiFqKSLpceJEA-qSA9MEqNQuFx7UMUq2Y5J6gBLpHEwmIn3LJpH-Cmvk8V2cIw5JNcqJFwJn5z4ZgbZZRQoRLtMoUb6Gg6tMkLZXGzVe_JcWJt5_UbZGUtmih9qbee6G18WuNOzDZnwvfVJZNYA6ZgxGRq_l6mXg0YhSomhu20Glh6U2bvg-QADSphaLaGNgUMPrc9Sf59NXqdAOzAkIZROf_ecsLtOsZtxnrlXKJmqvFWl0B71nJazvUPwFxGQhq
linkProvider	Oxford University Press
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DNABERT%3A+pre-trained+Bidirectional+Encoder+Representations+from+Transformers+model+for+DNA-language+in+genome&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Ji%2C+Yanrong&rft.au=Zhou%2C+Zhihan&rft.au=Liu%2C+Han&rft.au=Davuluri%2C+Ramana+V&rft.date=2021-08-09&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=37&rft.issue=15&rft.spage=2112&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab083&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon