A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES

Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-lev...

Full description

Saved in:

Bibliographic Details
Published in	The annals of applied statistics Vol. 11; no. 4; p. 2027
Main Author	Zhou, Xiang
Format	Journal Article
Language	English
Published	United States 01.12.2017
Subjects	MINQUE variance component Genome-wide association studies summary statistics method of moments linear mixed model
Online Access	Get more information

Cover

Loading…

Abstract	Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
AbstractList	Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Author	Zhou, Xiang
Author_xml	– sequence: 1 givenname: Xiang surname: Zhou fullname: Zhou, Xiang organization: University of Michigan
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/29515717$$D View this record in MEDLINE/PubMed
BookMark	eNo1j01LwzAAhnOYuA89eZf8gWq-2x5Dl27BNZEmdXgayZaB4mbZ9OC_tzA9vfC8vC88UzA6fh4TAHcYPWCC2SPOM2mlw4iTEZjgkpJMYJ6PwfR8fkeIs4LhazAmJR8oziegl7AzutZqDutWNmpt2ydY2xa-yFZLUylY2ebZGmU8VM7rRnptDVxrv4SuaxrZvkLnBzh0lYPawIUytlHZWs8VlM7ZSl8mzndzrdwNuNqHj3O6_csZ6Grlq2W2sgtdyVW2pRh_ZYFSIjBFxQ6LQFJJcGKEFrkgKLLERR4HgbgtSQgppT1hnMWCU0RCFCmKSGbg_vLbf8dD2m3609shnH42_-rkF-2YUJ4
CitedBy_id	crossref_primary_10_1093_bioinformatics_bty644 crossref_primary_10_1093_bioinformatics_btac659 crossref_primary_10_3389_fgene_2020_587887 crossref_primary_10_3389_fpls_2024_1466857 crossref_primary_10_1093_g3journal_jkad118 crossref_primary_10_1093_genetics_iyab087 crossref_primary_10_1093_bib_bbac067 crossref_primary_10_1038_s41467_020_17576_9 crossref_primary_10_1016_j_aquaculture_2022_738320 crossref_primary_10_1186_s12859_022_05034_w crossref_primary_10_1371_journal_pgen_1008124 crossref_primary_10_1186_s13059_021_02478_w crossref_primary_10_1038_s41467_023_43565_9 crossref_primary_10_1016_j_ajhg_2021_03_002 crossref_primary_10_1016_j_molp_2020_03_003 crossref_primary_10_1111_nph_16459 crossref_primary_10_1126_science_aba4674 crossref_primary_10_1016_j_tig_2021_06_004 crossref_primary_10_1111_jbg_12813 crossref_primary_10_1159_000496867 crossref_primary_10_1093_cercor_bhy216 crossref_primary_10_1016_j_ajhg_2018_06_002 crossref_primary_10_1371_journal_pgen_1009293 crossref_primary_10_1371_journal_pone_0220827 crossref_primary_10_1093_nargab_lqaa010 crossref_primary_10_1038_s41598_024_56060_y crossref_primary_10_1371_journal_pcbi_1009659 crossref_primary_10_1016_j_ajhg_2021_03_018 crossref_primary_10_1038_s41588_022_01189_7 crossref_primary_10_1371_journal_pcbi_1012469 crossref_primary_10_1002_bies_202100170 crossref_primary_10_1534_genetics_120_303161 crossref_primary_10_1016_j_ajhg_2022_09_001 crossref_primary_10_3389_fgene_2021_612045 crossref_primary_10_7554_eLife_90636 crossref_primary_10_1002_gepi_22432 crossref_primary_10_7554_eLife_90636_3 crossref_primary_10_1093_bioinformatics_btae298 crossref_primary_10_1002_gepi_22516 crossref_primary_10_1126_science_abo2059 crossref_primary_10_1371_journal_pgen_1007186 crossref_primary_10_1093_g3journal_jkae263 crossref_primary_10_1371_journal_pgen_1011037 crossref_primary_10_1371_journal_pgen_1007978 crossref_primary_10_3390_genes13081430 crossref_primary_10_1111_pbi_14340 crossref_primary_10_1093_bioadv_vbad027 crossref_primary_10_1101_gr_279207_124 crossref_primary_10_3389_fgene_2020_581594 crossref_primary_10_1214_18_AOAS1222 crossref_primary_10_1016_j_jspi_2023_03_002 crossref_primary_10_1016_j_ajhg_2020_03_013 crossref_primary_10_1073_pnas_2408715121 crossref_primary_10_1371_journal_pgen_1008734 crossref_primary_10_1371_journal_pgen_1008855 crossref_primary_10_1016_j_gpb_2020_10_007 crossref_primary_10_1038_s41467_023_43209_y
ContentType	Journal Article
DBID	NPM
DOI	10.1214/17-AOAS1052
DatabaseName	PubMed
DatabaseTitle	PubMed
DatabaseTitleList	PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database
DeliveryMethod	no_fulltext_linktorsrc
Discipline	Mathematics
ExternalDocumentID	29515717
Genre	Journal Article
GrantInformation_xml	– fundername: NHGRI NIH HHS grantid: R01 HG009124 – fundername: Wellcome Trust – fundername: NIGMS NIH HHS grantid: R01 GM126553 – fundername: NHLBI NIH HHS grantid: N01 HC025195
GroupedDBID	123 23M 2AX 6J9 AAWIL ABAWQ ABBHK ABFAN ABQDR ABXSQ ABYWD ABZEH ACDIW ACGFO ACHJO ACMTB ACTMH ADODI ADULT AELLO AENEX AETVE AEUPB AFFOW AFVYC AGLNM AIHAF AKBRZ ALMA_UNASSIGNED_HOLDINGS ALRMG AS~ CS3 DQDLB DSRWC EBS ECEWR EJD F5P FEDTE GIFXF GR0 HDK HQ6 HVGLF IPSME J9A JAA JAAYA JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST NPM OK1 P2P PUASD RBU RNS RPE SA0 SJN TN5 WHG WS9
ID	FETCH-LOGICAL-c311t-a33261308d16a2e921e42387620b4e567b157bc92aaeeef2454b85302ab6eb6b2
ISSN	1932-6157
IngestDate	Sat May 31 02:09:02 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Keywords	MINQUE variance component Genome-wide association studies summary statistics method of moments linear mixed model
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c311t-a33261308d16a2e921e42387620b4e567b157bc92aaeeef2454b85302ab6eb6b2
PMID	29515717
ParticipantIDs	pubmed_primary_29515717
PublicationCentury	2000
PublicationDate	2017-Dec
PublicationDateYYYYMMDD	2017-12-01
PublicationDate_xml	– month: 12 year: 2017 text: 2017-Dec
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	The annals of applied statistics
PublicationTitleAlternate	Ann Appl Stat
PublicationYear	2017
SSID	ssj0054841
Score	2.4441202
Snippet	Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance...
SourceID	pubmed
SourceType	Index Database
StartPage	2027
Title	A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES
URI	https://www.ncbi.nlm.nih.gov/pubmed/29515717
Volume	11
hasFullText
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb5wwELXycWkPVZN-JG1S-dAbogVjvo5oQ7okAqoN6W5PkU1MpEjZrNTtpb--z5gFlLZqmwtCNiDj9xi_sZgZQt47tRdKfu3hS2uUjRW_saUfK1syp0ETr0Wjg5PzIphe8rOFvxjKW7XRJWv5of7x27iSx6CKNuCqo2T_A9n-oWjAOfDFEQjj-E8YJ9p9O83SE_3DXZ7Oy9m5Ba_O-pLMMl10xpqU-eeySIvKSi-qLDclduZZNbW0NU1mX6EI0Yi-iQ4ltj6lRZmn9jw7SU3Cx8zcMtaNtwO_RJ99WXRiVscnmdTPw470_XeN4wI8vBnvMWDdGv7XMGYRKg9Opkkl3dtNd8QPPjaCjgn3_8U6M5frjYLQTsrkAsKOja_C1K7uWqAYVJ8fuuHfex-kyt50bZNtOA26CqreujHLMjyztoxp_ypdsCbG9HE0Ip0cunvKA0ejFRzVc_Ks8xRoYmDfI1tquU-e5n2a3W8vyCqhHQFoTwAKAtANAWhPADoQgGoC0I4AdCAAzQo6IgAdEYB2BHhJLk_TajK1uxIadu257toWHuQ5ZEp07QaCqZi5CvpZr4CO5MoPQolXlXXMhFBKNYz7XEa6kJSQgZKBZK_IzvJ-qQ4IbRxcF8WuqqXPVSBk5MR15PtR4Hi1EvEheW2m62pl8qRcbSbyzR973pInA92OyG4D0qpjqLy1fNei9xOlUz4u
linkProvider	National Library of Medicine
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+UNIFIED+FRAMEWORK+FOR+VARIANCE+COMPONENT+ESTIMATION+WITH+SUMMARY+STATISTICS+IN+GENOME-WIDE+ASSOCIATION+STUDIES&rft.jtitle=The+annals+of+applied+statistics&rft.au=Zhou%2C+Xiang&rft.date=2017-12-01&rft.issn=1932-6157&rft.volume=11&rft.issue=4&rft.spage=2027&rft_id=info:doi/10.1214%2F17-AOAS1052&rft_id=info%3Apmid%2F29515717&rft_id=info%3Apmid%2F29515717&rft.externalDocID=29515717
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6157&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6157&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6157&client=summon