Nucleotide dependency analysis of DNA language models reveals genomic functional elements
Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , , , , , , |
Format | Paper |
Language | English |
Published |
Cold Spring Harbor Laboratory
27.07.2024
|
Edition | 1.1 |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes. |
---|---|
AbstractList | Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes. |
Author | Wagner, Nils Hingerl, Johannes Tomaz da Silva, Pedro Karollus, Alexander Incarnato, Danny Hernandez-Alias, Xavier Gagneur, Julien Galindez, Gihanna |
Author_xml | – sequence: 1 givenname: Pedro orcidid: 0000-0001-6320-4885 surname: Tomaz da Silva fullname: Tomaz da Silva, Pedro organization: Munich Center for Machine Learning – sequence: 2 givenname: Alexander orcidid: 0000-0001-7570-7877 surname: Karollus fullname: Karollus, Alexander organization: Munich Center for Machine Learning – sequence: 3 givenname: Johannes surname: Hingerl fullname: Hingerl, Johannes organization: Munich Center for Machine Learning – sequence: 4 givenname: Gihanna orcidid: 0000-0002-3980-938X surname: Galindez fullname: Galindez, Gihanna organization: Munich Data Science Institute, Technical University of Munich – sequence: 5 givenname: Nils orcidid: 0009-0006-5661-1646 surname: Wagner fullname: Wagner, Nils organization: School of Computation, Information and Technology, Technical University of Munich – sequence: 6 givenname: Xavier orcidid: 0000-0001-8633-499X surname: Hernandez-Alias fullname: Hernandez-Alias, Xavier organization: Mechanisms of Protein Biogenesis, Max Planck Institute of Biochemistry – sequence: 7 givenname: Danny orcidid: 0000-0003-3944-2327 surname: Incarnato fullname: Incarnato, Danny organization: Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen – sequence: 8 givenname: Julien orcidid: 0000-0002-8924-8365 surname: Gagneur fullname: Gagneur, Julien email: gagneur@in.tum.de organization: Computational Health Center, Helmholtz Center |
BookMark | eNotkL1OwzAYRS0EEqX0Adg8siR8duw4GavyK1Vl6cIU2c7nyCixq_xU5O0JKtNZzr3DuSPXIQYk5IFByhiwJw5cpKBSrtIcpGDFFVnxvORJwUHeks0wfAMAL3OWKbEiX4fJthhHXyOt8YShxmBnqoNu58EPNDr6fNjSVodm0g3SLtbYDrTHM-qFDYbYeUvdFOzo47Ki2GKHYRzuyY1bFNz8c02Ory_H3Xuy_3z72G33iVF5kZg6l84azblj0mgnRKaK0qITRkpQUmhXAyplrANdGsM0Wg5FrkqeZbjYa_J4uTU-9j_-XJ163-l-rv5CVKAqrqpLiOwXZGtWoQ |
Cites_doi | 10.1101/2024.02.27.582234 10.48550/arXiv.2403.00043 10.1101/2024.02.09.579631 10.48550/arXiv.2403.03234 10.1101/2023.01.11.523679 10.48550/arXiv.2311.12570 10.48550/arXiv.2307.08691 10.1101/2022.08.06.503062 10.48550/arXiv.2306.15794 10.1101/2023.10.10.561776 |
ContentType | Paper |
Copyright | 2024, Posted by Cold Spring Harbor Laboratory |
Copyright_xml | – notice: 2024, Posted by Cold Spring Harbor Laboratory |
DBID | FX. |
DOI | 10.1101/2024.07.27.605418 |
DatabaseName | bioRxiv |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: FX. name: bioRxiv url: https://www.biorxiv.org/ sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 2692-8205 |
Edition | 1.1 |
ExternalDocumentID | 2024.07.27.605418v1 |
GroupedDBID | 8FE 8FH AFKRA ALMA_UNASSIGNED_HOLDINGS BBNVY BENPR BHPHI FX. HCIFZ LK8 M7P NQS PIMPY PROAC RHI |
ID | FETCH-LOGICAL-b768-bd65fcba22f15baf443789cef4b550754afd0e77bcf0a9bb1aec208679233e443 |
IEDL.DBID | FX. |
IngestDate | Tue Jan 07 18:57:39 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
License | This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at http://creativecommons.org/licenses/by-nc-nd/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-b768-bd65fcba22f15baf443789cef4b550754afd0e77bcf0a9bb1aec208679233e443 |
Notes | Competing Interest Statement: The authors have declared no competing interest. |
ORCID | 0000-0001-8633-499X 0000-0003-3944-2327 0000-0002-8924-8365 0000-0002-3980-938X 0000-0001-6320-4885 0000-0001-7570-7877 0009-0006-5661-1646 |
OpenAccessLink | https://www.biorxiv.org/content/10.1101/2024.07.27.605418 |
PageCount | 33 |
ParticipantIDs | biorxiv_primary_2024_07_27_605418 |
PublicationCentury | 2000 |
PublicationDate | 20240727 |
PublicationDateYYYYMMDD | 2024-07-27 |
PublicationDate_xml | – month: 7 year: 2024 text: 20240727 day: 27 |
PublicationDecade | 2020 |
PublicationTitle | bioRxiv |
PublicationYear | 2024 |
Publisher | Cold Spring Harbor Laboratory |
Publisher_xml | – name: Cold Spring Harbor Laboratory |
References | Landrum (2024.07.27.605418v1.23) 2018; 46 Mathews (2024.07.27.605418v1.37) 2019; 162–163 Givens (2024.07.27.605418v1.43) 2012; 40 Lorenz (2024.07.27.605418v1.56) 2011; 6 Renganaath (2024.07.27.605418v1.20) 2020; 9 Manfredonia (2024.07.27.605418v1.57) 2020; 48 Wang, Sarkar, Carbonetto, Stephens (2024.07.27.605418v1.18) 2020; 82 2024.07.27.605418v1.48 2024.07.27.605418v1.47 Nguyen (2024.07.27.605418v1.8) 2024 de Boer, Hughes (2024.07.27.605418v1.32) 2012; 40 Marin (2024.07.27.605418v1.16) 2024 Rossi (2024.07.27.605418v1.33) 2021; 592 Sayers (2024.07.27.605418v1.51) 2021; 49 Vilov, Heinig (2024.07.27.605418v1.13) 2024 Aguet (2024.07.27.605418v1.29) 2017; 550 McLaren (2024.07.27.605418v1.50) 2016; 17 Vorontsov (2024.07.27.605418v1.31) 2024; 52 Benegas, Albors, Aw, Ye, Song (2024.07.27.605418v1.6) 2024 Karollus (2024.07.27.605418v1.4) 2024; 25 Chen (2024.07.27.605418v1.15) 2022 Incarnato, Morandi, Simon, Oliviero (2024.07.27.605418v1.54) 2018; 46 2024.07.27.605418v1.52 Sloma, Mathews (2024.07.27.605418v1.58) 2016; 22 Chen (2024.07.27.605418v1.14) 2024; 25 Ji, Zhou, Liu, Davuluri (2024.07.27.605418v1.10) 2021; 37 Dalla-Torre (2024.07.27.605418v1.9) 2023 Penić, Vlašić, Huber, Wan, Šikić (2024.07.27.605418v1.36) 2024 Gazave, Marqués-Bonet, Fernando, Charlesworth, Navarro (2024.07.27.605418v1.21) 2007; 8 Zubradt (2024.07.27.605418v1.41) 2017; 14 Leontis, Westhof (2024.07.27.605418v1.53) 2001; 7 Eraslan, Avsec, Gagneur, Theis (2024.07.27.605418v1.22) 2019; 20 Siepel (2024.07.27.605418v1.25) 2005; 15 2024.07.27.605418v1.28 Kerimov (2024.07.27.605418v1.19) 2021; 53 Kalvari (2024.07.27.605418v1.39) 2021; 49 Penić, Vlašić, Huber, Wan, Šikić (2024.07.27.605418v1.7) 2024 Alföldi, Lindblad-Toh (2024.07.27.605418v1.1) 2013; 23 Pollard, Hubisz, Rosenbloom, Siepel (2024.07.27.605418v1.26) 2010; 20 Yanofsky (2024.07.27.605418v1.40) 2007; 13 Kircher (2024.07.27.605418v1.24) 2019; 10 Grant, Bailey, Noble (2024.07.27.605418v1.49) 2011; 27 Benegas, Batra, Song (2024.07.27.605418v1.5) 2023; 120 Schiff (2024.07.27.605418v1.11) 2024 Kavita, Breaker (2024.07.27.605418v1.42) 2023; 48 Langmead, Salzberg (2024.07.27.605418v1.55) 2012; 9 Rivas, Clements, Eddy (2024.07.27.605418v1.3) 2017; 14 Wagner (2024.07.27.605418v1.34) 2023; 55 Raney (2024.07.27.605418v1.46) 2024; 52 Delagoutte, Moras, Cavarelli (2024.07.27.605418v1.17) 2000; 19 2024.07.27.605418v1.2 Dao (2024.07.27.605418v1.45) 2023 Sullivan (2024.07.27.605418v1.27) 2023; 380 Gould (2024.07.27.605418v1.35) 2016; 22 Martin (2024.07.27.605418v1.44) 2023; 51 Puton, Kozlowski, Rother, Bujnicki (2024.07.27.605418v1.38) 2013; 41 Nguyen (2024.07.27.605418v1.12) 2023 Avsec (2024.07.27.605418v1.30) 2021; 18 |
References_xml | – volume: 41 start-page: 4307 year: 2013 end-page: 4323 ident: 2024.07.27.605418v1.38 article-title: CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction publication-title: Nucleic Acids Res – year: 2024 ident: 2024.07.27.605418v1.8 publication-title: Sequence modeling and design from molecular to genome scale with Evo doi: 10.1101/2024.02.27.582234 – ident: 2024.07.27.605418v1.28 article-title: Identification of constrained sequence elements across 239 primate genomes publication-title: Nature – volume: 52 start-page: D1082 year: 2024 end-page: D1088 ident: 2024.07.27.605418v1.46 article-title: The UCSC Genome Browser database: 2024 update publication-title: Nucleic Acids Res – volume: 7 start-page: 499 year: 2001 ident: 2024.07.27.605418v1.53 article-title: Geometric nomenclature and classification of RNA base pairs publication-title: RNA – volume: 120 start-page: e2311219120 year: 2023 ident: 2024.07.27.605418v1.5 article-title: DNA language models are powerful predictors of genome-wide variant effects publication-title: Proc. Natl. Acad. Sci – volume: 82 start-page: 1273 year: 2020 end-page: 1300 ident: 2024.07.27.605418v1.18 article-title: A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol – volume: 14 start-page: 75 year: 2017 end-page: 82 ident: 2024.07.27.605418v1.41 article-title: DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo publication-title: Nat. Methods – volume: 17 issue: 122 year: 2016 ident: 2024.07.27.605418v1.50 article-title: The Ensembl Variant Effect Predictor publication-title: Genome Biol – year: 2024 ident: 2024.07.27.605418v1.36 publication-title: RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks doi: 10.48550/arXiv.2403.00043 – volume: 20 start-page: 389 year: 2019 end-page: 403 ident: 2024.07.27.605418v1.22 article-title: Deep learning: new computational modelling techniques for genomics publication-title: Nat. Rev. Genet – year: 2024 ident: 2024.07.27.605418v1.13 publication-title: Investigating the performance of foundation models on human 3’UTR sequences doi: 10.1101/2024.02.09.579631 – volume: 25 year: 2024 ident: 2024.07.27.605418v1.14 article-title: Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction publication-title: Brief. Bioinform – ident: 2024.07.27.605418v1.2 article-title: Direct-coupling analysis of residue coevolution captures native contacts across many protein families publication-title: PNAS – year: 2024 ident: 2024.07.27.605418v1.11 publication-title: Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling doi: 10.48550/arXiv.2403.03234 – volume: 550 start-page: 204 year: 2017 end-page: 213 ident: 2024.07.27.605418v1.29 article-title: Genetic effects on gene expression across human tissues publication-title: Nature – volume: 49 start-page: D92 year: 2021 end-page: D96 ident: 2024.07.27.605418v1.51 article-title: GenBank publication-title: Nucleic Acids Res – volume: 40 start-page: D169 year: 2012 end-page: D179 ident: 2024.07.27.605418v1.32 article-title: YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities publication-title: Nucleic Acids Res – volume: 9 start-page: e62669 year: 2020 ident: 2024.07.27.605418v1.20 article-title: Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross publication-title: eLife – year: 2023 ident: 2024.07.27.605418v1.9 publication-title: The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics doi: 10.1101/2023.01.11.523679 – volume: 49 start-page: D192 year: 2021 end-page: D200 ident: 2024.07.27.605418v1.39 article-title: Rfam 14: expanded coverage of metagenomic, viral and microRNA families publication-title: Nucleic Acids Res – volume: 18 start-page: 1196 year: 2021 end-page: 1203 ident: 2024.07.27.605418v1.30 article-title: Effective gene expression prediction from sequence by integrating long-range interactions publication-title: Nat. Methods – ident: 2024.07.27.605418v1.48 publication-title: Nucleic Acids Research – volume: 15 start-page: 1034 year: 2005 end-page: 1050 ident: 2024.07.27.605418v1.25 article-title: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes publication-title: Genome Res – volume: 46 start-page: e97 year: 2018 ident: 2024.07.27.605418v1.54 article-title: RNA Framework: an all-in-one toolkit for the analysis of RNA structures and post-transcriptional modifications publication-title: Nucleic Acids Res – volume: 25 issue: 83 year: 2024 ident: 2024.07.27.605418v1.4 article-title: Species-aware DNA language models capture regulatory elements and their evolution publication-title: Genome Biol – volume: 48 start-page: 119 year: 2023 end-page: 141 ident: 2024.07.27.605418v1.42 article-title: Discovering riboswitches: the past and the future publication-title: Trends Biochem. Sci – ident: 2024.07.27.605418v1.52 publication-title: Nucleic Acids Research – year: 2024 ident: 2024.07.27.605418v1.16 publication-title: BEND: Benchmarking DNA Language Models on biologically meaningful tasks doi: 10.48550/arXiv.2311.12570 – volume: 40 start-page: 7176 year: 2012 end-page: 7189 ident: 2024.07.27.605418v1.43 article-title: Chromatin architectures at fission yeast transcriptional promoters and replication origins publication-title: Nucleic Acids Res – volume: 51 start-page: D933 year: 2023 end-page: D941 ident: 2024.07.27.605418v1.44 article-title: Ensembl 2023 publication-title: Nucleic Acids Res – volume: 6 start-page: 26 year: 2011 ident: 2024.07.27.605418v1.56 article-title: ViennaRNA Package 2.0 publication-title: Algorithms Mol. Biol – volume: 37 start-page: 2112 year: 2021 end-page: 2120 ident: 2024.07.27.605418v1.10 article-title: DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome publication-title: Bioinformatics – volume: 52 start-page: D154 year: 2024 end-page: D163 ident: 2024.07.27.605418v1.31 article-title: HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors publication-title: Nucleic Acids Res – volume: 592 start-page: 309 year: 2021 end-page: 314 ident: 2024.07.27.605418v1.33 article-title: A high-resolution protein architecture of the budding yeast genome publication-title: Nature – volume: 53 start-page: 1290 year: 2021 end-page: 1299 ident: 2024.07.27.605418v1.19 article-title: A compendium of uniformly processed human gene expression and splicing quantitative trait loci publication-title: Nat. Genet – year: 2023 ident: 2024.07.27.605418v1.45 publication-title: FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning doi: 10.48550/arXiv.2307.08691 – volume: 22 start-page: 1808 year: 2016 end-page: 1818 ident: 2024.07.27.605418v1.58 article-title: Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures publication-title: RNA – volume: 162–163 start-page: 60 year: 2019 end-page: 67 ident: 2024.07.27.605418v1.37 article-title: How to benchmark RNA secondary structure prediction accuracy publication-title: Methods – volume: 8 issue: R21 year: 2007 ident: 2024.07.27.605418v1.21 article-title: Patterns and rates of intron divergence between humans and chimpanzees publication-title: Genome Biol – volume: 10 start-page: 3583 year: 2019 ident: 2024.07.27.605418v1.24 article-title: Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution publication-title: Nat. Commun – volume: 55 start-page: 861 year: 2023 end-page: 870 ident: 2024.07.27.605418v1.34 article-title: Aberrant splicing prediction across human tissues publication-title: Nat. Genet – volume: 27 start-page: 1017 year: 2011 end-page: 1018 ident: 2024.07.27.605418v1.49 article-title: FIMO: scanning for occurrences of a given motif publication-title: Bioinformatics – year: 2022 ident: 2024.07.27.605418v1.15 publication-title: Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions doi: 10.1101/2022.08.06.503062 – volume: 13 start-page: 1141 year: 2007 end-page: 1154 ident: 2024.07.27.605418v1.40 article-title: RNA-based regulation of genes of tryptophan synthesis and degradation, in bacteria publication-title: RNA – volume: 22 start-page: 1522 year: 2016 end-page: 1534 ident: 2024.07.27.605418v1.35 article-title: Identification of new branch points and unconventional introns in Saccharomyces cerevisiae publication-title: RNA – volume: 46 start-page: D1062 year: 2018 end-page: D1067 ident: 2024.07.27.605418v1.23 article-title: ClinVar: improving access to variant interpretations and supporting evidence publication-title: Nucleic Acids Res – ident: 2024.07.27.605418v1.47 article-title: Accurate proteome-wide missense variant effect prediction with AlphaMissense publication-title: Science – volume: 48 start-page: 12436 year: 2020 end-page: 12452 ident: 2024.07.27.605418v1.57 article-title: Genome-wide mapping of SARS-CoV-2 RNA structures identifies therapeutically-relevant elements publication-title: Nucleic Acids Res – year: 2023 ident: 2024.07.27.605418v1.12 publication-title: HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution doi: 10.48550/arXiv.2306.15794 – volume: 14 start-page: 45 year: 2017 end-page: 48 ident: 2024.07.27.605418v1.3 article-title: A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs publication-title: Nat. Methods – year: 2024 ident: 2024.07.27.605418v1.7 publication-title: RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks doi: 10.48550/arXiv.2403.00043 – volume: 20 start-page: 110 year: 2010 end-page: 121 ident: 2024.07.27.605418v1.26 article-title: Detection of nonneutral substitution rates on mammalian phylogenies publication-title: Genome Res – volume: 9 start-page: 357 year: 2012 end-page: 359 ident: 2024.07.27.605418v1.55 article-title: Fast gapped-read alignment with Bowtie 2 publication-title: Nat. Methods – volume: 19 start-page: 5599 year: 2000 end-page: 5610 ident: 2024.07.27.605418v1.17 article-title: tRNA aminoacylation by arginyl-tRNA synthetase: induced conformations during substrates binding publication-title: EMBO J – volume: 23 start-page: 1063 year: 2013 end-page: 1068 ident: 2024.07.27.605418v1.1 article-title: Comparative genomics as a tool to understand evolution and disease publication-title: Genome Res – volume: 380 year: 2023 ident: 2024.07.27.605418v1.27 article-title: Leveraging base-pair mammalian constraint to understand genetic variation and human disease publication-title: Science – year: 2024 ident: 2024.07.27.605418v1.6 article-title: S publication-title: GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction doi: 10.1101/2023.10.10.561776 |
SSID | ssj0002961374 |
Score | 1.732024 |
SecondaryResourceType | preprint |
Snippet | Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs)... |
SourceID | biorxiv |
SourceType | Open Access Repository |
SubjectTerms | Genomics |
Title | Nucleotide dependency analysis of DNA language models reveals genomic functional elements |
URI | https://www.biorxiv.org/content/10.1101/2024.07.27.605418 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3JasMwEIZFm1DorStdgwq9OmhzpBy7JIRCTSgppCejsSXwoXHIUpq3r8Z2Sg499GpkBCONZjQavp-Qe5ZLsMzkkQz5cYT8kshyxSPrgLngIVpWcm-vSW_0rl6m8XRH6gvbKqEoF9_FV_WOjw3b4fStnZtxvKsrpG0K3Q2JuOJmn7TDllLok8Np97e8IvohTmnVvGP--WfIeJuZdiLK8Ii0x3buFsdkz81OyEEtCbk5JR8JAobLVZE7uhWozTbUNuwQWnr6nDzQbZmRVko2S4ogprCRKCJXP4uMYriqq3zU1Q3iyzMyGQ4mT6OokT-IINwBIsh7sc_ACuF5DNYrJbXpZ84rQAZZrKzPmdMaMs9sH4BblwmG_DwhpQujz0lrVs7cBaESpEH2nHHcK8Fik-vMSCUl5ACKu0ty11gindeMixStlTKdCp3W1rr6x5hrcojfsPQp9A1prRZrdxti9go6pP04SMZvnWqVfgCbaJMg |
linkProvider | Cold Spring Harbor Laboratory Press |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ27T8MwEMYtaIVg4yneGInVlV-p0xEBVYE2YihSmSJfYksZaKq2IPrf40tSxMDAHEeRLrbv_Pn0-wi54bkCy-OcqVAfM-SXMCu0YNYBd2GFGFXZvY2S7uBVP02iSSO4LZq2SijK-VfxWd3jY8N22H3rxc0FntU10jal6YRCXIu4gzL1Jmkj6AxndX_S-dFYZC8kK6Oby8w_Xw9lb_O5X2mlv0vaL3bm5ntkw033yVbtC7k6IG8JUobLZZE7unapzVbUNgARWnp6n9zStdZIKzubBUUaU5hNFLmr70VGMWfVUh91dZf44pCM-w_juwFrPBAYhIMAg7wb-QyslF5EYL3WysS9zHkNCCKLtPU5d8ZA5rntAQjrMskRoieVcmH0EWlNy6k7JlSBihFAFzvhteRRnJssVlopyAG0cCfkuolEOqtBFylGK-UmlSato3X6jzFXZHswHg3T4WPyfEZ28DlqodKck9Zy_uEuQhJfwmX1p74B4nuWXQ |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ27T8MwEMYtaAVi4yneGIk1kV-p0xEBUXlFHYpUpsiX2FIGmqotiP73-JIUdWBgjpVIFzt3_nz5fYTcsEKCYXERSF8fB8gvCQxXPDAWmPUrRMva7u017Q3e1NM4Gq_9C4NtlVBWs-_yqz7Hx4Zt__VtFjfjuFdXSNsUOvSFuOJxiDJ1OC3cJun6uaXQviEZh786i-j7hKVVe6D55y186ds-ci21JLukOzRTO9sjG3ayT7Yab8jlAXlPkTRcLcrC0pVTbb6kpoWI0MrR-_SWrvRGWlvazCkSmfyMoshe_ShzinmrkfuobTrF54dklDyM7gZB64MQgN8MBFD0IpeDEcLxCIxTSuq4n1unAGFkkTKuYFZryB0zfQBubC4YgvSElNaPPiKdSTWxx4RKkDFC6GLLnRIsigudx1JJCQWA4vaEXLeRyKYN7CLDaGVMZ0JnTbRO_zHmimwP75Ps5TF9PiM7eBnlUKHPSWcx-7QXPo8v4LJ-UT8pq5dl |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Nucleotide+dependency+analysis+of+DNA+language+models+reveals+genomic+functional+elements&rft.jtitle=bioRxiv&rft.au=Tomaz+da+Silva%2C+Pedro&rft.au=Karollus%2C+Alexander&rft.au=Hingerl%2C+Johannes&rft.au=Galindez%2C+Gihanna&rft.date=2024-07-27&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.07.27.605418&rft.externalDocID=2024.07.27.605418v1 |