Effective gene expression prediction from sequence by integrating long-range interactions

How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a de...

Full description

Saved in:
Bibliographic Details
Published inNature methods Vol. 18; no. 10; pp. 1196 - 1203
Main Authors Avsec, Žiga, Agarwal, Vikram, Visentin, Daniel, Ledsam, Joseph R., Grabska-Barwinska, Agnieszka, Taylor, Kyle R., Assael, Yannis, Jumper, John, Kohli, Pushmeet, Kelley, David R.
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.10.2021
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis -regulatory evolution. By using a new deep learning architecture, Enformer leverages long-range information to improve prediction of gene expression on the basis of DNA sequence.
ISSN:1548-7091
1548-7105
DOI:10.1038/s41592-021-01252-x