A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES

Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-lev...

Full description

Saved in:
Bibliographic Details
Published inThe annals of applied statistics Vol. 11; no. 4; p. 2027
Main Author Zhou, Xiang
Format Journal Article
LanguageEnglish
Published United States 01.12.2017
Subjects
Online AccessGet more information

Cover

Loading…
Abstract Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
AbstractList Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Author Zhou, Xiang
Author_xml – sequence: 1
  givenname: Xiang
  surname: Zhou
  fullname: Zhou, Xiang
  organization: University of Michigan
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29515717$$D View this record in MEDLINE/PubMed
BookMark eNo1j01LwzAAhnOYuA89eZf8gWq-2x5Dl27BNZEmdXgayZaB4mbZ9OC_tzA9vfC8vC88UzA6fh4TAHcYPWCC2SPOM2mlw4iTEZjgkpJMYJ6PwfR8fkeIs4LhazAmJR8oziegl7AzutZqDutWNmpt2ydY2xa-yFZLUylY2ebZGmU8VM7rRnptDVxrv4SuaxrZvkLnBzh0lYPawIUytlHZWs8VlM7ZSl8mzndzrdwNuNqHj3O6_csZ6Grlq2W2sgtdyVW2pRh_ZYFSIjBFxQ6LQFJJcGKEFrkgKLLERR4HgbgtSQgppT1hnMWCU0RCFCmKSGbg_vLbf8dD2m3609shnH42_-rkF-2YUJ4
CitedBy_id crossref_primary_10_1093_bioinformatics_bty644
crossref_primary_10_1093_bioinformatics_btac659
crossref_primary_10_3389_fgene_2020_587887
crossref_primary_10_3389_fpls_2024_1466857
crossref_primary_10_1093_g3journal_jkad118
crossref_primary_10_1093_genetics_iyab087
crossref_primary_10_1093_bib_bbac067
crossref_primary_10_1038_s41467_020_17576_9
crossref_primary_10_1016_j_aquaculture_2022_738320
crossref_primary_10_1186_s12859_022_05034_w
crossref_primary_10_1371_journal_pgen_1008124
crossref_primary_10_1186_s13059_021_02478_w
crossref_primary_10_1038_s41467_023_43565_9
crossref_primary_10_1016_j_ajhg_2021_03_002
crossref_primary_10_1016_j_molp_2020_03_003
crossref_primary_10_1111_nph_16459
crossref_primary_10_1126_science_aba4674
crossref_primary_10_1016_j_tig_2021_06_004
crossref_primary_10_1111_jbg_12813
crossref_primary_10_1159_000496867
crossref_primary_10_1093_cercor_bhy216
crossref_primary_10_1016_j_ajhg_2018_06_002
crossref_primary_10_1371_journal_pgen_1009293
crossref_primary_10_1371_journal_pone_0220827
crossref_primary_10_1093_nargab_lqaa010
crossref_primary_10_1038_s41598_024_56060_y
crossref_primary_10_1371_journal_pcbi_1009659
crossref_primary_10_1016_j_ajhg_2021_03_018
crossref_primary_10_1038_s41588_022_01189_7
crossref_primary_10_1371_journal_pcbi_1012469
crossref_primary_10_1002_bies_202100170
crossref_primary_10_1534_genetics_120_303161
crossref_primary_10_1016_j_ajhg_2022_09_001
crossref_primary_10_3389_fgene_2021_612045
crossref_primary_10_7554_eLife_90636
crossref_primary_10_1002_gepi_22432
crossref_primary_10_7554_eLife_90636_3
crossref_primary_10_1093_bioinformatics_btae298
crossref_primary_10_1002_gepi_22516
crossref_primary_10_1126_science_abo2059
crossref_primary_10_1371_journal_pgen_1007186
crossref_primary_10_1093_g3journal_jkae263
crossref_primary_10_1371_journal_pgen_1011037
crossref_primary_10_1371_journal_pgen_1007978
crossref_primary_10_3390_genes13081430
crossref_primary_10_1111_pbi_14340
crossref_primary_10_1093_bioadv_vbad027
crossref_primary_10_1101_gr_279207_124
crossref_primary_10_3389_fgene_2020_581594
crossref_primary_10_1214_18_AOAS1222
crossref_primary_10_1016_j_jspi_2023_03_002
crossref_primary_10_1016_j_ajhg_2020_03_013
crossref_primary_10_1073_pnas_2408715121
crossref_primary_10_1371_journal_pgen_1008734
crossref_primary_10_1371_journal_pgen_1008855
crossref_primary_10_1016_j_gpb_2020_10_007
crossref_primary_10_1038_s41467_023_43209_y
ContentType Journal Article
DBID NPM
DOI 10.1214/17-AOAS1052
DatabaseName PubMed
DatabaseTitle PubMed
DatabaseTitleList PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Mathematics
ExternalDocumentID 29515717
Genre Journal Article
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: R01 HG009124
– fundername: Wellcome Trust
– fundername: NIGMS NIH HHS
  grantid: R01 GM126553
– fundername: NHLBI NIH HHS
  grantid: N01 HC025195
GroupedDBID 123
23M
2AX
6J9
AAWIL
ABAWQ
ABBHK
ABFAN
ABQDR
ABXSQ
ABYWD
ABZEH
ACDIW
ACGFO
ACHJO
ACMTB
ACTMH
ADODI
ADULT
AELLO
AENEX
AETVE
AEUPB
AFFOW
AFVYC
AGLNM
AIHAF
AKBRZ
ALMA_UNASSIGNED_HOLDINGS
ALRMG
AS~
CS3
DQDLB
DSRWC
EBS
ECEWR
EJD
F5P
FEDTE
GIFXF
GR0
HDK
HQ6
HVGLF
IPSME
J9A
JAA
JAAYA
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JMS
JPL
JST
NPM
OK1
P2P
PUASD
RBU
RNS
RPE
SA0
SJN
TN5
WHG
WS9
ID FETCH-LOGICAL-c311t-a33261308d16a2e921e42387620b4e567b157bc92aaeeef2454b85302ab6eb6b2
ISSN 1932-6157
IngestDate Sat May 31 02:09:02 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords MINQUE
variance component
Genome-wide association studies
summary statistics
method of moments
linear mixed model
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c311t-a33261308d16a2e921e42387620b4e567b157bc92aaeeef2454b85302ab6eb6b2
PMID 29515717
ParticipantIDs pubmed_primary_29515717
PublicationCentury 2000
PublicationDate 2017-Dec
PublicationDateYYYYMMDD 2017-12-01
PublicationDate_xml – month: 12
  year: 2017
  text: 2017-Dec
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle The annals of applied statistics
PublicationTitleAlternate Ann Appl Stat
PublicationYear 2017
SSID ssj0054841
Score 2.4441202
Snippet Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance...
SourceID pubmed
SourceType Index Database
StartPage 2027
Title A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES
URI https://www.ncbi.nlm.nih.gov/pubmed/29515717
Volume 11
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb5wwELXycWkPVZN-JG1S-dAbogVjvo5oQ7okAqoN6W5PkU1MpEjZrNTtpb--z5gFlLZqmwtCNiDj9xi_sZgZQt47tRdKfu3hS2uUjRW_saUfK1syp0ETr0Wjg5PzIphe8rOFvxjKW7XRJWv5of7x27iSx6CKNuCqo2T_A9n-oWjAOfDFEQjj-E8YJ9p9O83SE_3DXZ7Oy9m5Ba_O-pLMMl10xpqU-eeySIvKSi-qLDclduZZNbW0NU1mX6EI0Yi-iQ4ltj6lRZmn9jw7SU3Cx8zcMtaNtwO_RJ99WXRiVscnmdTPw470_XeN4wI8vBnvMWDdGv7XMGYRKg9Opkkl3dtNd8QPPjaCjgn3_8U6M5frjYLQTsrkAsKOja_C1K7uWqAYVJ8fuuHfex-kyt50bZNtOA26CqreujHLMjyztoxp_ypdsCbG9HE0Ip0cunvKA0ejFRzVc_Ks8xRoYmDfI1tquU-e5n2a3W8vyCqhHQFoTwAKAtANAWhPADoQgGoC0I4AdCAAzQo6IgAdEYB2BHhJLk_TajK1uxIadu257toWHuQ5ZEp07QaCqZi5CvpZr4CO5MoPQolXlXXMhFBKNYz7XEa6kJSQgZKBZK_IzvJ-qQ4IbRxcF8WuqqXPVSBk5MR15PtR4Hi1EvEheW2m62pl8qRcbSbyzR973pInA92OyG4D0qpjqLy1fNei9xOlUz4u
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+UNIFIED+FRAMEWORK+FOR+VARIANCE+COMPONENT+ESTIMATION+WITH+SUMMARY+STATISTICS+IN+GENOME-WIDE+ASSOCIATION+STUDIES&rft.jtitle=The+annals+of+applied+statistics&rft.au=Zhou%2C+Xiang&rft.date=2017-12-01&rft.issn=1932-6157&rft.volume=11&rft.issue=4&rft.spage=2027&rft_id=info:doi/10.1214%2F17-AOAS1052&rft_id=info%3Apmid%2F29515717&rft_id=info%3Apmid%2F29515717&rft.externalDocID=29515717
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6157&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6157&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6157&client=summon