A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers

Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clini...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 9; no. 1; pp. 3522 - 11
Main Authors Mosley, Jonathan D., Feng, QiPing, Wells, Quinn S., Van Driest, Sara L., Shaffer, Christian M., Edwards, Todd L., Bastarache, Lisa, Wei, Wei-Qi, Davis, Lea K., McCarty, Catherine A., Thompson, Will, Chute, Christopher G., Jarvik, Gail P., Gordon, Adam S., Palmer, Melody R., Crosslin, David R., Larson, Eric B., Carrell, David S., Kullo, Iftikhar J., Pacheco, Jennifer A., Peissig, Peggy L., Brilliant, Murray H., Linneman, James G., Namjou, Bahram, Williams, Marc S., Ritchie, Marylyn D., Borthwick, Kenneth M., Verma, Shefali S., Karnes, Jason H., Weiss, Scott T., Wang, Thomas J., Stein, C. Michael, Denny, Josh C., Roden, Dan M.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 30.08.2018
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations. Biomarker identification requires prohibitively large cohorts with gene expression and phenotype data. The approach introduced here learns polygenic predictors of expression from genetic and expression data, used to infer biomarker levels in patients with genetic and disease information.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-018-05624-4