A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-lev...
Saved in:
Published in | The annals of applied statistics Vol. 11; no. 4; p. 2027 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
United States
01.12.2017
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Abstract | Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal
-scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html. |
---|---|
AbstractList | Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal
-scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html. |
Author | Zhou, Xiang |
Author_xml | – sequence: 1 givenname: Xiang surname: Zhou fullname: Zhou, Xiang organization: University of Michigan |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/29515717$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j01LwzAAhnOYuA89eZf8gWq-2x5Dl27BNZEmdXgayZaB4mbZ9OC_tzA9vfC8vC88UzA6fh4TAHcYPWCC2SPOM2mlw4iTEZjgkpJMYJ6PwfR8fkeIs4LhazAmJR8oziegl7AzutZqDutWNmpt2ydY2xa-yFZLUylY2ebZGmU8VM7rRnptDVxrv4SuaxrZvkLnBzh0lYPawIUytlHZWs8VlM7ZSl8mzndzrdwNuNqHj3O6_csZ6Grlq2W2sgtdyVW2pRh_ZYFSIjBFxQ6LQFJJcGKEFrkgKLLERR4HgbgtSQgppT1hnMWCU0RCFCmKSGbg_vLbf8dD2m3609shnH42_-rkF-2YUJ4 |
CitedBy_id | crossref_primary_10_1093_bioinformatics_bty644 crossref_primary_10_1093_bioinformatics_btac659 crossref_primary_10_3389_fgene_2020_587887 crossref_primary_10_3389_fpls_2024_1466857 crossref_primary_10_1093_g3journal_jkad118 crossref_primary_10_1093_genetics_iyab087 crossref_primary_10_1093_bib_bbac067 crossref_primary_10_1038_s41467_020_17576_9 crossref_primary_10_1016_j_aquaculture_2022_738320 crossref_primary_10_1186_s12859_022_05034_w crossref_primary_10_1371_journal_pgen_1008124 crossref_primary_10_1186_s13059_021_02478_w crossref_primary_10_1038_s41467_023_43565_9 crossref_primary_10_1016_j_ajhg_2021_03_002 crossref_primary_10_1016_j_molp_2020_03_003 crossref_primary_10_1111_nph_16459 crossref_primary_10_1126_science_aba4674 crossref_primary_10_1016_j_tig_2021_06_004 crossref_primary_10_1111_jbg_12813 crossref_primary_10_1159_000496867 crossref_primary_10_1093_cercor_bhy216 crossref_primary_10_1016_j_ajhg_2018_06_002 crossref_primary_10_1371_journal_pgen_1009293 crossref_primary_10_1371_journal_pone_0220827 crossref_primary_10_1093_nargab_lqaa010 crossref_primary_10_1038_s41598_024_56060_y crossref_primary_10_1371_journal_pcbi_1009659 crossref_primary_10_1016_j_ajhg_2021_03_018 crossref_primary_10_1038_s41588_022_01189_7 crossref_primary_10_1371_journal_pcbi_1012469 crossref_primary_10_1002_bies_202100170 crossref_primary_10_1534_genetics_120_303161 crossref_primary_10_1016_j_ajhg_2022_09_001 crossref_primary_10_3389_fgene_2021_612045 crossref_primary_10_7554_eLife_90636 crossref_primary_10_1002_gepi_22432 crossref_primary_10_7554_eLife_90636_3 crossref_primary_10_1093_bioinformatics_btae298 crossref_primary_10_1002_gepi_22516 crossref_primary_10_1126_science_abo2059 crossref_primary_10_1371_journal_pgen_1007186 crossref_primary_10_1093_g3journal_jkae263 crossref_primary_10_1371_journal_pgen_1011037 crossref_primary_10_1371_journal_pgen_1007978 crossref_primary_10_3390_genes13081430 crossref_primary_10_1111_pbi_14340 crossref_primary_10_1093_bioadv_vbad027 crossref_primary_10_1101_gr_279207_124 crossref_primary_10_3389_fgene_2020_581594 crossref_primary_10_1214_18_AOAS1222 crossref_primary_10_1016_j_jspi_2023_03_002 crossref_primary_10_1016_j_ajhg_2020_03_013 crossref_primary_10_1073_pnas_2408715121 crossref_primary_10_1371_journal_pgen_1008734 crossref_primary_10_1371_journal_pgen_1008855 crossref_primary_10_1016_j_gpb_2020_10_007 crossref_primary_10_1038_s41467_023_43209_y |
ContentType | Journal Article |
DBID | NPM |
DOI | 10.1214/17-AOAS1052 |
DatabaseName | PubMed |
DatabaseTitle | PubMed |
DatabaseTitleList | PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Mathematics |
ExternalDocumentID | 29515717 |
Genre | Journal Article |
GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: R01 HG009124 – fundername: Wellcome Trust – fundername: NIGMS NIH HHS grantid: R01 GM126553 – fundername: NHLBI NIH HHS grantid: N01 HC025195 |
GroupedDBID | 123 23M 2AX 6J9 AAWIL ABAWQ ABBHK ABFAN ABQDR ABXSQ ABYWD ABZEH ACDIW ACGFO ACHJO ACMTB ACTMH ADODI ADULT AELLO AENEX AETVE AEUPB AFFOW AFVYC AGLNM AIHAF AKBRZ ALMA_UNASSIGNED_HOLDINGS ALRMG AS~ CS3 DQDLB DSRWC EBS ECEWR EJD F5P FEDTE GIFXF GR0 HDK HQ6 HVGLF IPSME J9A JAA JAAYA JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST NPM OK1 P2P PUASD RBU RNS RPE SA0 SJN TN5 WHG WS9 |
ID | FETCH-LOGICAL-c311t-a33261308d16a2e921e42387620b4e567b157bc92aaeeef2454b85302ab6eb6b2 |
ISSN | 1932-6157 |
IngestDate | Sat May 31 02:09:02 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Keywords | MINQUE variance component Genome-wide association studies summary statistics method of moments linear mixed model |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c311t-a33261308d16a2e921e42387620b4e567b157bc92aaeeef2454b85302ab6eb6b2 |
PMID | 29515717 |
ParticipantIDs | pubmed_primary_29515717 |
PublicationCentury | 2000 |
PublicationDate | 2017-Dec |
PublicationDateYYYYMMDD | 2017-12-01 |
PublicationDate_xml | – month: 12 year: 2017 text: 2017-Dec |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | The annals of applied statistics |
PublicationTitleAlternate | Ann Appl Stat |
PublicationYear | 2017 |
SSID | ssj0054841 |
Score | 2.4441202 |
Snippet | Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 2027 |
Title | A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES |
URI | https://www.ncbi.nlm.nih.gov/pubmed/29515717 |
Volume | 11 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb5wwELXycWkPVZN-JG1S-dAbogVjvo5oQ7okAqoN6W5PkU1MpEjZrNTtpb--z5gFlLZqmwtCNiDj9xi_sZgZQt47tRdKfu3hS2uUjRW_saUfK1syp0ETr0Wjg5PzIphe8rOFvxjKW7XRJWv5of7x27iSx6CKNuCqo2T_A9n-oWjAOfDFEQjj-E8YJ9p9O83SE_3DXZ7Oy9m5Ba_O-pLMMl10xpqU-eeySIvKSi-qLDclduZZNbW0NU1mX6EI0Yi-iQ4ltj6lRZmn9jw7SU3Cx8zcMtaNtwO_RJ99WXRiVscnmdTPw470_XeN4wI8vBnvMWDdGv7XMGYRKg9Opkkl3dtNd8QPPjaCjgn3_8U6M5frjYLQTsrkAsKOja_C1K7uWqAYVJ8fuuHfex-kyt50bZNtOA26CqreujHLMjyztoxp_ypdsCbG9HE0Ip0cunvKA0ejFRzVc_Ks8xRoYmDfI1tquU-e5n2a3W8vyCqhHQFoTwAKAtANAWhPADoQgGoC0I4AdCAAzQo6IgAdEYB2BHhJLk_TajK1uxIadu257toWHuQ5ZEp07QaCqZi5CvpZr4CO5MoPQolXlXXMhFBKNYz7XEa6kJSQgZKBZK_IzvJ-qQ4IbRxcF8WuqqXPVSBk5MR15PtR4Hi1EvEheW2m62pl8qRcbSbyzR973pInA92OyG4D0qpjqLy1fNei9xOlUz4u |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+UNIFIED+FRAMEWORK+FOR+VARIANCE+COMPONENT+ESTIMATION+WITH+SUMMARY+STATISTICS+IN+GENOME-WIDE+ASSOCIATION+STUDIES&rft.jtitle=The+annals+of+applied+statistics&rft.au=Zhou%2C+Xiang&rft.date=2017-12-01&rft.issn=1932-6157&rft.volume=11&rft.issue=4&rft.spage=2027&rft_id=info:doi/10.1214%2F17-AOAS1052&rft_id=info%3Apmid%2F29515717&rft_id=info%3Apmid%2F29515717&rft.externalDocID=29515717 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6157&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6157&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6157&client=summon |