Systematic Analysis of Transcription Factors Binding to Noncoding Variants

A large number of sequence variants have been linked to complex human traits and diseases 1 , but deciphering their biological functions is still challenging since most of them reside in the noncoding DNA. To fill this gap, we have systematically assessed the binding of 270 human transcription facto...

Full description

Saved in:
Bibliographic Details
Published inNature (London) Vol. 591; no. 7848; pp. 147 - 151
Main Authors Yan, Jian, Qiu, Yunjiang, Ribeiro dos Santos, André M, Yin, Yimeng, Li, Yang E., Vinckier, Nick, Nariai, Naoki, Benaglio, Paola, Raman, Anugraha, Li, Xiaoyu, Fan, Shicai, Chiou, Joshua, Chen, Fulin, Frazer, Kelly A., Gaulton, Kyle J., Sander, Maike, Taipale, Jussi, Ren, Bing
Format Journal Article
LanguageEnglish
Published 27.01.2021
Online AccessGet full text

Cover

Loading…
More Information
Summary:A large number of sequence variants have been linked to complex human traits and diseases 1 , but deciphering their biological functions is still challenging since most of them reside in the noncoding DNA. To fill this gap, we have systematically assessed the binding of 270 human transcription factors (TF) to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed SNP evaluation by Systematic Evolution of Ligands by EXponential enrichment (SNP-SELEX). The resulting 828 million measurements of TF-DNA interactions enable estimation of the relative affinity of these TFs to each variant in vitro and allow for evaluation of the current methods to predict the impact of noncoding variants on TF binding. We show that the Position Weight Matrices (PWMs) of most TFs lack sufficient predictive power, while the Support Vector Machine (SVM) combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human TFs and demonstrate their utility in genome-wide association studies (GWAS) and understanding of the molecular pathways involved in diverse human traits and diseases.
Bibliography:Author information
B. Ren is a co-founder and consultant for Arima Genomics, Inc., and a co-founder of Epigenome Technologies, Inc.
Author contribution
B.R., M.S., K.J.G., K.A.F., J.T., and J.Y. conceived the project. J.Y., Y.Y., X.L., N.N., and N.V. carried out experiments. Y.Q., A.M.R.S., Y.E.L., A.R., S.F., P.B., F.C., and J.C. performed data analysis. J.Y., Y.Q., A.M.R.S., J.T., and B.R. wrote the manuscript with input from all co-authors.
These authors contributed equally
ISSN:0028-0836
1476-4687
DOI:10.1038/s41586-021-03211-0