Nucleotide dependency analysis of DNA language models reveals genomic functional elements

Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Tomaz da Silva, Pedro, Karollus, Alexander, Hingerl, Johannes, Galindez, Gihanna, Wagner, Nils, Hernandez-Alias, Xavier, Incarnato, Danny, Gagneur, Julien
Format Paper
LanguageEnglish
Published Cold Spring Harbor Laboratory 27.07.2024
Edition1.1
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes.
AbstractList Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes.
Author Wagner, Nils
Hingerl, Johannes
Tomaz da Silva, Pedro
Karollus, Alexander
Incarnato, Danny
Hernandez-Alias, Xavier
Gagneur, Julien
Galindez, Gihanna
Author_xml – sequence: 1
  givenname: Pedro
  orcidid: 0000-0001-6320-4885
  surname: Tomaz da Silva
  fullname: Tomaz da Silva, Pedro
  organization: Munich Center for Machine Learning
– sequence: 2
  givenname: Alexander
  orcidid: 0000-0001-7570-7877
  surname: Karollus
  fullname: Karollus, Alexander
  organization: Munich Center for Machine Learning
– sequence: 3
  givenname: Johannes
  surname: Hingerl
  fullname: Hingerl, Johannes
  organization: Munich Center for Machine Learning
– sequence: 4
  givenname: Gihanna
  orcidid: 0000-0002-3980-938X
  surname: Galindez
  fullname: Galindez, Gihanna
  organization: Munich Data Science Institute, Technical University of Munich
– sequence: 5
  givenname: Nils
  orcidid: 0009-0006-5661-1646
  surname: Wagner
  fullname: Wagner, Nils
  organization: School of Computation, Information and Technology, Technical University of Munich
– sequence: 6
  givenname: Xavier
  orcidid: 0000-0001-8633-499X
  surname: Hernandez-Alias
  fullname: Hernandez-Alias, Xavier
  organization: Mechanisms of Protein Biogenesis, Max Planck Institute of Biochemistry
– sequence: 7
  givenname: Danny
  orcidid: 0000-0003-3944-2327
  surname: Incarnato
  fullname: Incarnato, Danny
  organization: Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen
– sequence: 8
  givenname: Julien
  orcidid: 0000-0002-8924-8365
  surname: Gagneur
  fullname: Gagneur, Julien
  email: gagneur@in.tum.de
  organization: Computational Health Center, Helmholtz Center
BookMark eNotkL1OwzAYRS0EEqX0Adg8siR8duw4GavyK1Vl6cIU2c7nyCixq_xU5O0JKtNZzr3DuSPXIQYk5IFByhiwJw5cpKBSrtIcpGDFFVnxvORJwUHeks0wfAMAL3OWKbEiX4fJthhHXyOt8YShxmBnqoNu58EPNDr6fNjSVodm0g3SLtbYDrTHM-qFDYbYeUvdFOzo47Ki2GKHYRzuyY1bFNz8c02Ory_H3Xuy_3z72G33iVF5kZg6l84azblj0mgnRKaK0qITRkpQUmhXAyplrANdGsM0Wg5FrkqeZbjYa_J4uTU-9j_-XJ163-l-rv5CVKAqrqpLiOwXZGtWoQ
Cites_doi 10.1101/2024.02.27.582234
10.48550/arXiv.2403.00043
10.1101/2024.02.09.579631
10.48550/arXiv.2403.03234
10.1101/2023.01.11.523679
10.48550/arXiv.2311.12570
10.48550/arXiv.2307.08691
10.1101/2022.08.06.503062
10.48550/arXiv.2306.15794
10.1101/2023.10.10.561776
ContentType Paper
Copyright 2024, Posted by Cold Spring Harbor Laboratory
Copyright_xml – notice: 2024, Posted by Cold Spring Harbor Laboratory
DBID FX.
DOI 10.1101/2024.07.27.605418
DatabaseName bioRxiv
DatabaseTitleList
Database_xml – sequence: 1
  dbid: FX.
  name: bioRxiv
  url: https://www.biorxiv.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2692-8205
Edition 1.1
ExternalDocumentID 2024.07.27.605418v1
GroupedDBID 8FE
8FH
AFKRA
ALMA_UNASSIGNED_HOLDINGS
BBNVY
BENPR
BHPHI
FX.
HCIFZ
LK8
M7P
NQS
PIMPY
PROAC
RHI
ID FETCH-LOGICAL-b768-bd65fcba22f15baf443789cef4b550754afd0e77bcf0a9bb1aec208679233e443
IEDL.DBID FX.
IngestDate Tue Jan 07 18:57:39 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
License This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at http://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-b768-bd65fcba22f15baf443789cef4b550754afd0e77bcf0a9bb1aec208679233e443
Notes Competing Interest Statement: The authors have declared no competing interest.
ORCID 0000-0001-8633-499X
0000-0003-3944-2327
0000-0002-8924-8365
0000-0002-3980-938X
0000-0001-6320-4885
0000-0001-7570-7877
0009-0006-5661-1646
OpenAccessLink https://www.biorxiv.org/content/10.1101/2024.07.27.605418
PageCount 33
ParticipantIDs biorxiv_primary_2024_07_27_605418
PublicationCentury 2000
PublicationDate 20240727
PublicationDateYYYYMMDD 2024-07-27
PublicationDate_xml – month: 7
  year: 2024
  text: 20240727
  day: 27
PublicationDecade 2020
PublicationTitle bioRxiv
PublicationYear 2024
Publisher Cold Spring Harbor Laboratory
Publisher_xml – name: Cold Spring Harbor Laboratory
References Landrum (2024.07.27.605418v1.23) 2018; 46
Mathews (2024.07.27.605418v1.37) 2019; 162–163
Givens (2024.07.27.605418v1.43) 2012; 40
Lorenz (2024.07.27.605418v1.56) 2011; 6
Renganaath (2024.07.27.605418v1.20) 2020; 9
Manfredonia (2024.07.27.605418v1.57) 2020; 48
Wang, Sarkar, Carbonetto, Stephens (2024.07.27.605418v1.18) 2020; 82
2024.07.27.605418v1.48
2024.07.27.605418v1.47
Nguyen (2024.07.27.605418v1.8) 2024
de Boer, Hughes (2024.07.27.605418v1.32) 2012; 40
Marin (2024.07.27.605418v1.16) 2024
Rossi (2024.07.27.605418v1.33) 2021; 592
Sayers (2024.07.27.605418v1.51) 2021; 49
Vilov, Heinig (2024.07.27.605418v1.13) 2024
Aguet (2024.07.27.605418v1.29) 2017; 550
McLaren (2024.07.27.605418v1.50) 2016; 17
Vorontsov (2024.07.27.605418v1.31) 2024; 52
Benegas, Albors, Aw, Ye, Song (2024.07.27.605418v1.6) 2024
Karollus (2024.07.27.605418v1.4) 2024; 25
Chen (2024.07.27.605418v1.15) 2022
Incarnato, Morandi, Simon, Oliviero (2024.07.27.605418v1.54) 2018; 46
2024.07.27.605418v1.52
Sloma, Mathews (2024.07.27.605418v1.58) 2016; 22
Chen (2024.07.27.605418v1.14) 2024; 25
Ji, Zhou, Liu, Davuluri (2024.07.27.605418v1.10) 2021; 37
Dalla-Torre (2024.07.27.605418v1.9) 2023
Penić, Vlašić, Huber, Wan, Šikić (2024.07.27.605418v1.36) 2024
Gazave, Marqués-Bonet, Fernando, Charlesworth, Navarro (2024.07.27.605418v1.21) 2007; 8
Zubradt (2024.07.27.605418v1.41) 2017; 14
Leontis, Westhof (2024.07.27.605418v1.53) 2001; 7
Eraslan, Avsec, Gagneur, Theis (2024.07.27.605418v1.22) 2019; 20
Siepel (2024.07.27.605418v1.25) 2005; 15
2024.07.27.605418v1.28
Kerimov (2024.07.27.605418v1.19) 2021; 53
Kalvari (2024.07.27.605418v1.39) 2021; 49
Penić, Vlašić, Huber, Wan, Šikić (2024.07.27.605418v1.7) 2024
Alföldi, Lindblad-Toh (2024.07.27.605418v1.1) 2013; 23
Pollard, Hubisz, Rosenbloom, Siepel (2024.07.27.605418v1.26) 2010; 20
Yanofsky (2024.07.27.605418v1.40) 2007; 13
Kircher (2024.07.27.605418v1.24) 2019; 10
Grant, Bailey, Noble (2024.07.27.605418v1.49) 2011; 27
Benegas, Batra, Song (2024.07.27.605418v1.5) 2023; 120
Schiff (2024.07.27.605418v1.11) 2024
Kavita, Breaker (2024.07.27.605418v1.42) 2023; 48
Langmead, Salzberg (2024.07.27.605418v1.55) 2012; 9
Rivas, Clements, Eddy (2024.07.27.605418v1.3) 2017; 14
Wagner (2024.07.27.605418v1.34) 2023; 55
Raney (2024.07.27.605418v1.46) 2024; 52
Delagoutte, Moras, Cavarelli (2024.07.27.605418v1.17) 2000; 19
2024.07.27.605418v1.2
Dao (2024.07.27.605418v1.45) 2023
Sullivan (2024.07.27.605418v1.27) 2023; 380
Gould (2024.07.27.605418v1.35) 2016; 22
Martin (2024.07.27.605418v1.44) 2023; 51
Puton, Kozlowski, Rother, Bujnicki (2024.07.27.605418v1.38) 2013; 41
Nguyen (2024.07.27.605418v1.12) 2023
Avsec (2024.07.27.605418v1.30) 2021; 18
References_xml – volume: 41
  start-page: 4307
  year: 2013
  end-page: 4323
  ident: 2024.07.27.605418v1.38
  article-title: CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction
  publication-title: Nucleic Acids Res
– year: 2024
  ident: 2024.07.27.605418v1.8
  publication-title: Sequence modeling and design from molecular to genome scale with Evo
  doi: 10.1101/2024.02.27.582234
– ident: 2024.07.27.605418v1.28
  article-title: Identification of constrained sequence elements across 239 primate genomes
  publication-title: Nature
– volume: 52
  start-page: D1082
  year: 2024
  end-page: D1088
  ident: 2024.07.27.605418v1.46
  article-title: The UCSC Genome Browser database: 2024 update
  publication-title: Nucleic Acids Res
– volume: 7
  start-page: 499
  year: 2001
  ident: 2024.07.27.605418v1.53
  article-title: Geometric nomenclature and classification of RNA base pairs
  publication-title: RNA
– volume: 120
  start-page: e2311219120
  year: 2023
  ident: 2024.07.27.605418v1.5
  article-title: DNA language models are powerful predictors of genome-wide variant effects
  publication-title: Proc. Natl. Acad. Sci
– volume: 82
  start-page: 1273
  year: 2020
  end-page: 1300
  ident: 2024.07.27.605418v1.18
  article-title: A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping
  publication-title: J. R. Stat. Soc. Ser. B Stat. Methodol
– volume: 14
  start-page: 75
  year: 2017
  end-page: 82
  ident: 2024.07.27.605418v1.41
  article-title: DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo
  publication-title: Nat. Methods
– volume: 17
  issue: 122
  year: 2016
  ident: 2024.07.27.605418v1.50
  article-title: The Ensembl Variant Effect Predictor
  publication-title: Genome Biol
– year: 2024
  ident: 2024.07.27.605418v1.36
  publication-title: RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
  doi: 10.48550/arXiv.2403.00043
– volume: 20
  start-page: 389
  year: 2019
  end-page: 403
  ident: 2024.07.27.605418v1.22
  article-title: Deep learning: new computational modelling techniques for genomics
  publication-title: Nat. Rev. Genet
– year: 2024
  ident: 2024.07.27.605418v1.13
  publication-title: Investigating the performance of foundation models on human 3’UTR sequences
  doi: 10.1101/2024.02.09.579631
– volume: 25
  year: 2024
  ident: 2024.07.27.605418v1.14
  article-title: Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction
  publication-title: Brief. Bioinform
– ident: 2024.07.27.605418v1.2
  article-title: Direct-coupling analysis of residue coevolution captures native contacts across many protein families
  publication-title: PNAS
– year: 2024
  ident: 2024.07.27.605418v1.11
  publication-title: Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
  doi: 10.48550/arXiv.2403.03234
– volume: 550
  start-page: 204
  year: 2017
  end-page: 213
  ident: 2024.07.27.605418v1.29
  article-title: Genetic effects on gene expression across human tissues
  publication-title: Nature
– volume: 49
  start-page: D92
  year: 2021
  end-page: D96
  ident: 2024.07.27.605418v1.51
  article-title: GenBank
  publication-title: Nucleic Acids Res
– volume: 40
  start-page: D169
  year: 2012
  end-page: D179
  ident: 2024.07.27.605418v1.32
  article-title: YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities
  publication-title: Nucleic Acids Res
– volume: 9
  start-page: e62669
  year: 2020
  ident: 2024.07.27.605418v1.20
  article-title: Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross
  publication-title: eLife
– year: 2023
  ident: 2024.07.27.605418v1.9
  publication-title: The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
  doi: 10.1101/2023.01.11.523679
– volume: 49
  start-page: D192
  year: 2021
  end-page: D200
  ident: 2024.07.27.605418v1.39
  article-title: Rfam 14: expanded coverage of metagenomic, viral and microRNA families
  publication-title: Nucleic Acids Res
– volume: 18
  start-page: 1196
  year: 2021
  end-page: 1203
  ident: 2024.07.27.605418v1.30
  article-title: Effective gene expression prediction from sequence by integrating long-range interactions
  publication-title: Nat. Methods
– ident: 2024.07.27.605418v1.48
  publication-title: Nucleic Acids Research
– volume: 15
  start-page: 1034
  year: 2005
  end-page: 1050
  ident: 2024.07.27.605418v1.25
  article-title: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
  publication-title: Genome Res
– volume: 46
  start-page: e97
  year: 2018
  ident: 2024.07.27.605418v1.54
  article-title: RNA Framework: an all-in-one toolkit for the analysis of RNA structures and post-transcriptional modifications
  publication-title: Nucleic Acids Res
– volume: 25
  issue: 83
  year: 2024
  ident: 2024.07.27.605418v1.4
  article-title: Species-aware DNA language models capture regulatory elements and their evolution
  publication-title: Genome Biol
– volume: 48
  start-page: 119
  year: 2023
  end-page: 141
  ident: 2024.07.27.605418v1.42
  article-title: Discovering riboswitches: the past and the future
  publication-title: Trends Biochem. Sci
– ident: 2024.07.27.605418v1.52
  publication-title: Nucleic Acids Research
– year: 2024
  ident: 2024.07.27.605418v1.16
  publication-title: BEND: Benchmarking DNA Language Models on biologically meaningful tasks
  doi: 10.48550/arXiv.2311.12570
– volume: 40
  start-page: 7176
  year: 2012
  end-page: 7189
  ident: 2024.07.27.605418v1.43
  article-title: Chromatin architectures at fission yeast transcriptional promoters and replication origins
  publication-title: Nucleic Acids Res
– volume: 51
  start-page: D933
  year: 2023
  end-page: D941
  ident: 2024.07.27.605418v1.44
  article-title: Ensembl 2023
  publication-title: Nucleic Acids Res
– volume: 6
  start-page: 26
  year: 2011
  ident: 2024.07.27.605418v1.56
  article-title: ViennaRNA Package 2.0
  publication-title: Algorithms Mol. Biol
– volume: 37
  start-page: 2112
  year: 2021
  end-page: 2120
  ident: 2024.07.27.605418v1.10
  article-title: DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
  publication-title: Bioinformatics
– volume: 52
  start-page: D154
  year: 2024
  end-page: D163
  ident: 2024.07.27.605418v1.31
  article-title: HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors
  publication-title: Nucleic Acids Res
– volume: 592
  start-page: 309
  year: 2021
  end-page: 314
  ident: 2024.07.27.605418v1.33
  article-title: A high-resolution protein architecture of the budding yeast genome
  publication-title: Nature
– volume: 53
  start-page: 1290
  year: 2021
  end-page: 1299
  ident: 2024.07.27.605418v1.19
  article-title: A compendium of uniformly processed human gene expression and splicing quantitative trait loci
  publication-title: Nat. Genet
– year: 2023
  ident: 2024.07.27.605418v1.45
  publication-title: FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
  doi: 10.48550/arXiv.2307.08691
– volume: 22
  start-page: 1808
  year: 2016
  end-page: 1818
  ident: 2024.07.27.605418v1.58
  article-title: Exact calculation of loop formation probability identifies folding motifs in RNA secondary structures
  publication-title: RNA
– volume: 162–163
  start-page: 60
  year: 2019
  end-page: 67
  ident: 2024.07.27.605418v1.37
  article-title: How to benchmark RNA secondary structure prediction accuracy
  publication-title: Methods
– volume: 8
  issue: R21
  year: 2007
  ident: 2024.07.27.605418v1.21
  article-title: Patterns and rates of intron divergence between humans and chimpanzees
  publication-title: Genome Biol
– volume: 10
  start-page: 3583
  year: 2019
  ident: 2024.07.27.605418v1.24
  article-title: Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution
  publication-title: Nat. Commun
– volume: 55
  start-page: 861
  year: 2023
  end-page: 870
  ident: 2024.07.27.605418v1.34
  article-title: Aberrant splicing prediction across human tissues
  publication-title: Nat. Genet
– volume: 27
  start-page: 1017
  year: 2011
  end-page: 1018
  ident: 2024.07.27.605418v1.49
  article-title: FIMO: scanning for occurrences of a given motif
  publication-title: Bioinformatics
– year: 2022
  ident: 2024.07.27.605418v1.15
  publication-title: Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions
  doi: 10.1101/2022.08.06.503062
– volume: 13
  start-page: 1141
  year: 2007
  end-page: 1154
  ident: 2024.07.27.605418v1.40
  article-title: RNA-based regulation of genes of tryptophan synthesis and degradation, in bacteria
  publication-title: RNA
– volume: 22
  start-page: 1522
  year: 2016
  end-page: 1534
  ident: 2024.07.27.605418v1.35
  article-title: Identification of new branch points and unconventional introns in Saccharomyces cerevisiae
  publication-title: RNA
– volume: 46
  start-page: D1062
  year: 2018
  end-page: D1067
  ident: 2024.07.27.605418v1.23
  article-title: ClinVar: improving access to variant interpretations and supporting evidence
  publication-title: Nucleic Acids Res
– ident: 2024.07.27.605418v1.47
  article-title: Accurate proteome-wide missense variant effect prediction with AlphaMissense
  publication-title: Science
– volume: 48
  start-page: 12436
  year: 2020
  end-page: 12452
  ident: 2024.07.27.605418v1.57
  article-title: Genome-wide mapping of SARS-CoV-2 RNA structures identifies therapeutically-relevant elements
  publication-title: Nucleic Acids Res
– year: 2023
  ident: 2024.07.27.605418v1.12
  publication-title: HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
  doi: 10.48550/arXiv.2306.15794
– volume: 14
  start-page: 45
  year: 2017
  end-page: 48
  ident: 2024.07.27.605418v1.3
  article-title: A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs
  publication-title: Nat. Methods
– year: 2024
  ident: 2024.07.27.605418v1.7
  publication-title: RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
  doi: 10.48550/arXiv.2403.00043
– volume: 20
  start-page: 110
  year: 2010
  end-page: 121
  ident: 2024.07.27.605418v1.26
  article-title: Detection of nonneutral substitution rates on mammalian phylogenies
  publication-title: Genome Res
– volume: 9
  start-page: 357
  year: 2012
  end-page: 359
  ident: 2024.07.27.605418v1.55
  article-title: Fast gapped-read alignment with Bowtie 2
  publication-title: Nat. Methods
– volume: 19
  start-page: 5599
  year: 2000
  end-page: 5610
  ident: 2024.07.27.605418v1.17
  article-title: tRNA aminoacylation by arginyl-tRNA synthetase: induced conformations during substrates binding
  publication-title: EMBO J
– volume: 23
  start-page: 1063
  year: 2013
  end-page: 1068
  ident: 2024.07.27.605418v1.1
  article-title: Comparative genomics as a tool to understand evolution and disease
  publication-title: Genome Res
– volume: 380
  year: 2023
  ident: 2024.07.27.605418v1.27
  article-title: Leveraging base-pair mammalian constraint to understand genetic variation and human disease
  publication-title: Science
– year: 2024
  ident: 2024.07.27.605418v1.6
  article-title: S
  publication-title: GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction
  doi: 10.1101/2023.10.10.561776
SSID ssj0002961374
Score 1.732024
SecondaryResourceType preprint
Snippet Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs)...
SourceID biorxiv
SourceType Open Access Repository
SubjectTerms Genomics
Title Nucleotide dependency analysis of DNA language models reveals genomic functional elements
URI https://www.biorxiv.org/content/10.1101/2024.07.27.605418
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3JasMwEIZFm1DorStdgwq9OmhzpBy7JIRCTSgppCejsSXwoXHIUpq3r8Z2Sg499GpkBCONZjQavp-Qe5ZLsMzkkQz5cYT8kshyxSPrgLngIVpWcm-vSW_0rl6m8XRH6gvbKqEoF9_FV_WOjw3b4fStnZtxvKsrpG0K3Q2JuOJmn7TDllLok8Np97e8IvohTmnVvGP--WfIeJuZdiLK8Ii0x3buFsdkz81OyEEtCbk5JR8JAobLVZE7uhWozTbUNuwQWnr6nDzQbZmRVko2S4ogprCRKCJXP4uMYriqq3zU1Q3iyzMyGQ4mT6OokT-IINwBIsh7sc_ACuF5DNYrJbXpZ84rQAZZrKzPmdMaMs9sH4BblwmG_DwhpQujz0lrVs7cBaESpEH2nHHcK8Fik-vMSCUl5ACKu0ty11gindeMixStlTKdCp3W1rr6x5hrcojfsPQp9A1prRZrdxti9go6pP04SMZvnWqVfgCbaJMg
linkProvider Cold Spring Harbor Laboratory Press
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ27T8MwEMYtaIVg4yneGInVlV-p0xEBVYE2YihSmSJfYksZaKq2IPrf40tSxMDAHEeRLrbv_Pn0-wi54bkCy-OcqVAfM-SXMCu0YNYBd2GFGFXZvY2S7uBVP02iSSO4LZq2SijK-VfxWd3jY8N22H3rxc0FntU10jal6YRCXIu4gzL1Jmkj6AxndX_S-dFYZC8kK6Oby8w_Xw9lb_O5X2mlv0vaL3bm5ntkw033yVbtC7k6IG8JUobLZZE7unapzVbUNgARWnp6n9zStdZIKzubBUUaU5hNFLmr70VGMWfVUh91dZf44pCM-w_juwFrPBAYhIMAg7wb-QyslF5EYL3WysS9zHkNCCKLtPU5d8ZA5rntAQjrMskRoieVcmH0EWlNy6k7JlSBihFAFzvhteRRnJssVlopyAG0cCfkuolEOqtBFylGK-UmlSato3X6jzFXZHswHg3T4WPyfEZ28DlqodKck9Zy_uEuQhJfwmX1p74B4nuWXQ
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ27T8MwEMYtaAVi4yneGIk1kV-p0xEBUXlFHYpUpsiX2FIGmqotiP73-JIUdWBgjpVIFzt3_nz5fYTcsEKCYXERSF8fB8gvCQxXPDAWmPUrRMva7u017Q3e1NM4Gq_9C4NtlVBWs-_yqz7Hx4Zt__VtFjfjuFdXSNsUOvSFuOJxiDJ1OC3cJun6uaXQviEZh786i-j7hKVVe6D55y186ds-ci21JLukOzRTO9sjG3ayT7Yab8jlAXlPkTRcLcrC0pVTbb6kpoWI0MrR-_SWrvRGWlvazCkSmfyMoshe_ShzinmrkfuobTrF54dklDyM7gZB64MQgN8MBFD0IpeDEcLxCIxTSuq4n1unAGFkkTKuYFZryB0zfQBubC4YgvSElNaPPiKdSTWxx4RKkDFC6GLLnRIsigudx1JJCQWA4vaEXLeRyKYN7CLDaGVMZ0JnTbRO_zHmimwP75Ps5TF9PiM7eBnlUKHPSWcx-7QXPo8v4LJ-UT8pq5dl
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Nucleotide+dependency+analysis+of+DNA+language+models+reveals+genomic+functional+elements&rft.jtitle=bioRxiv&rft.au=Tomaz+da+Silva%2C+Pedro&rft.au=Karollus%2C+Alexander&rft.au=Hingerl%2C+Johannes&rft.au=Galindez%2C+Gihanna&rft.date=2024-07-27&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.07.27.605418&rft.externalDocID=2024.07.27.605418v1