Polygenic risk prediction: why and when out-of-sample prediction R2 can exceed SNP-based heritability

In polygenic score (PGS) analysis, the coefficient of determination (R2) is a key statistic to evaluate efficacy. R2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of alleli...

Full description

Saved in:
Bibliographic Details
Published inAmerican journal of human genetics Vol. 110; no. 7; pp. 1207 - 1215
Main Authors Wang, Xiaotong, Revez, Joana A., Ni, Guiyan, Adams, Mark J., McIntosh, Andrew M., Ripke, Stephan, Trzaskowski, Maciej, Byrne, Enda M., Air, Tracy M., Andlauer, Till F.M., Bacanu, Silviu-Alin, Bryois, Julien, Bybjerg-Grauholm, Jonas, Castelao, Enrique, Clarke, Toni-Kim, Colodro-Conde, Lucía, Couvy-Duchesne, Baptiste, Craddock, Nick, Davies, Gail, Derks, Eske M., Direk, Nese, Dolan, Conor V., Kiadeh, Farnush Farhadi Hassan, Gaspar, Héléna A., Gill, Michael, Goes, Fernando S., Gordon, Scott D., Grove, Jakob, Hall, Lynsey S., Homuth, Georg, Horn, Carsten, Jones, Ian, Jones, Lisa A., Jorgenson, Eric, Kraft, Julia, Kretzschmar, Warren W., Li, Yihan, MacIntyre, Donald J., MacKinnon, Dean F., Maier, Wolfgang, Marchini, Jonathan, Mbarek, Hamdi, McGuffin, Peter, Medland, Sarah E., Mehta, Divya, Middeldorp, Christel M., Mihailov, Evelin, Mondimore, Francis M., Montgomery, Grant W., Mullins, Niamh, Ng, Bernard, Nivard, Michel G., Nyholt, Dale R., Oskarsson, Hogni, Owen, Michael J., Bøcker Pedersen, Carsten, Giørtz Pedersen, Marianne, Peterson, Roseann E., Pistis, Giorgio, Saeed Mirza, Saira, Schoevers, Robert, Schulte, Eva C., Shen, Ling, Shi, Jianxin, Shyn, Stanley I., Sinnamon, Grant C.B., Smit, Johannes H., Stefansson, Hreinn, Strohmaier, Jana, Trubetskoy, Vassily, Uitterlinden, André G., Van der Auwera, Sandra, van Hemert, Albert M., Visscher, Peter M., Wang, Yunpeng, Weinsheimer, Shantel Marie, Wellmann, Jürgen, Willemsen, Gonneke, Xi, Hualin S., Berger, Klaus, Kendler, Kenneth S., Lewis, Glyn, Li, Qingqin S., Madden, Pamela A.F., Metspalu, Andres, Mors, Ole, Nöthen, Markus M., O'Donovan, Michael C., Paciga, Sara A., Pedersen, Nancy L., Porteous, David J., Potash, James B., Rietschel, Marcella, Smoller, Jordan W., Stefansson, Kari, Tiemeier, Henning, Völzke, Henry, Weissman, Myrna M.
Format Journal Article
LanguageEnglish
Published Elsevier Inc 06.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In polygenic score (PGS) analysis, the coefficient of determination (R2) is a key statistic to evaluate efficacy. R2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (hSNP2, the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R2. However, in real data analyses R2 has been reported to exceed hSNP2, which occurs in parallel with the observation that hSNP2 estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific hSNP2 exist, or if genetic correlations between cohorts are less than one, hSNP2 estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R2 will be greater than hSNP2 and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity. [Display omitted] SNP-based heritability estimates tend to decline and then plateau as the number of cohorts being meta-analyzed in a GWAS increase, and the out-of-sample prediction R2 in "target" cohorts can sometimes exceed its theoretical upper limit. Here, we provide theory to explain these observations that reflect heterogeneity between cohorts in meta-analyses.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0002-9297
1537-6605
DOI:10.1016/j.ajhg.2023.06.006