A unified powerful set-based test for sequencing data analysis of GxE interactions

SummaryThe development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have...

Full description

Saved in:
Bibliographic Details
Published inBiostatistics (Oxford, England) Vol. 18; no. 1; pp. 119 - 131
Main Authors Su, Yu-Ru, Di, Chong-Zhi, Hsu, Li
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.01.2017
Subjects
Online AccessGet full text
ISSN1465-4644
1468-4357
DOI10.1093/biostatistics/kxw034

Cover

Abstract SummaryThe development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have not explored the potential role environmental factors play in modifying genetic risk. Analysis of gene-environment interaction (GxE) for rare variants poses considerable challenges because of variant rarity and paucity of subjects who carry the variants while being exposed. To tackle this challenge, we propose a hierarchical model to jointly assess the GxE effects of a set of rare variants for example, in a gene or regulatory region, leveraging the information across the variants. Under this model, GxE is modeled by two components. The first component incorporates variant functional information as weights to calculate the weighted burden of variant alleles across variants, and then assess their GxE interaction with the environmental factor. Since this information is a priori known, this component is fixed effects in the model. The second component involves residual GxE effects that have not been accounted for by the fixed effects. In this component, the residual GxE effects are postulated to follow an unspecified distribution with mean 0 and variance [Formula: see text] We develop a novel testing procedure by deriving two independent score statistics for the fixed effects and the variance component separately. We propose two data-adaptive combination approaches for combining these two score statistics and establish the asymptotic distributions. An extensive simulation study shows that the proposed approaches maintain the correct type I error and the power is comparable to or better than existing methods under a wide range of scenarios. Finally we illustrate the proposed methods by a exome-wide GxE analysis with NSAIDs use in colorectal cancer.
AbstractList SummaryThe development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have not explored the potential role environmental factors play in modifying genetic risk. Analysis of gene-environment interaction (GxE) for rare variants poses considerable challenges because of variant rarity and paucity of subjects who carry the variants while being exposed. To tackle this challenge, we propose a hierarchical model to jointly assess the GxE effects of a set of rare variants for example, in a gene or regulatory region, leveraging the information across the variants. Under this model, GxE is modeled by two components. The first component incorporates variant functional information as weights to calculate the weighted burden of variant alleles across variants, and then assess their GxE interaction with the environmental factor. Since this information is a priori known, this component is fixed effects in the model. The second component involves residual GxE effects that have not been accounted for by the fixed effects. In this component, the residual GxE effects are postulated to follow an unspecified distribution with mean 0 and variance [Formula: see text] We develop a novel testing procedure by deriving two independent score statistics for the fixed effects and the variance component separately. We propose two data-adaptive combination approaches for combining these two score statistics and establish the asymptotic distributions. An extensive simulation study shows that the proposed approaches maintain the correct type I error and the power is comparable to or better than existing methods under a wide range of scenarios. Finally we illustrate the proposed methods by a exome-wide GxE analysis with NSAIDs use in colorectal cancer.
The development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have not explored the potential role environmental factors play in modifying genetic risk. Analysis of gene–environment interaction (GxE) for rare variants poses considerable challenges because of variant rarity and paucity of subjects who carry the variants while being exposed. To tackle this challenge, we propose a hierarchical model to jointly assess the GxE effects of a set of rare variants for example, in a gene or regulatory region, leveraging the information across the variants. Under this model, GxE is modeled by two components. The first component incorporates variant functional information as weights to calculate the weighted burden of variant alleles across variants, and then assess their GxE interaction with the environmental factor. Since this information is a priori known, this component is fixed effects in the model. The second component involves residual GxE effects that have not been accounted for by the fixed effects. In this component, the residual GxE effects are postulated to follow an unspecified distribution with mean 0 and variance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$\tau^2$\end{document} . We develop a novel testing procedure by deriving two independent score statistics for the fixed effects and the variance component separately. We propose two data-adaptive combination approaches for combining these two score statistics and establish the asymptotic distributions. An extensive simulation study shows that the proposed approaches maintain the correct type I error and the power is comparable to or better than existing methods under a wide range of scenarios. Finally we illustrate the proposed methods by a exome-wide GxE analysis with NSAIDs use in colorectal cancer.
Author Hsu, Li
Su, Yu-Ru
Di, Chong-Zhi
Author_xml – sequence: 1
  givenname: Yu-Ru
  surname: Su
  fullname: Su, Yu-Ru
– sequence: 2
  givenname: Chong-Zhi
  surname: Di
  fullname: Di, Chong-Zhi
– sequence: 3
  givenname: Li
  surname: Hsu
  fullname: Hsu, Li
BackLink https://www.ncbi.nlm.nih.gov/pubmed/27474101$$D View this record in MEDLINE/PubMed
BookMark eNp9kd1KAzEQhYMo1qpvIJIXWJu_ze56IRSpVSgIotdhNptodE3qJrX69m6tivXCqxlm5nwM5wzRtg_eIHREyQklFR_VLsQEycXkdBw9vS0JF1tojwpZZoLnxfZnn2dCCjFAwxgfCWGMS76LBqwQhaCE7qGbMV54Z51p8DwsTWcXLY4mZTXEfpRMTNiGrh-9LIzXzt_jBhJg8NC-RxdxsHj6NsHOJ9OBTi74eIB2LLTRHH7VfXR3Mbk9v8xm19Or8_Es04KUKaOgBeiiqWpmBWVarF4qGXBmpQXRgJEyrzkpdMUkSColryprSwNVXpW84vvobM2dL-pn02jjUwetmnfuGbp3FcCpzY13D-o-vKqc5TnJSQ84_g34UX670x-crg90F2LsjFXarSwPK55rFSVqFYXaiEKto-jF4o_4m_-v7ANsMZb1
CitedBy_id crossref_primary_10_1158_2767_9764_CRC_21_0119
crossref_primary_10_1016_j_xgen_2024_100591
crossref_primary_10_1038_s41598_023_28172_4
crossref_primary_10_1093_aje_kwx227
crossref_primary_10_1093_aje_kwx228
crossref_primary_10_3389_fgene_2021_710055
crossref_primary_10_3389_fgene_2021_682638
crossref_primary_10_1093_g3journal_jkae263
crossref_primary_10_1158_1055_9965_EPI_19_1018
crossref_primary_10_1002_sim_8037
crossref_primary_10_1093_jnci_djac094
crossref_primary_10_1016_j_ebiom_2024_105146
crossref_primary_10_1038_s41576_024_00731_z
crossref_primary_10_1002_gepi_22351
crossref_primary_10_1002_gepi_22273
crossref_primary_10_1093_biostatistics_kxad004
crossref_primary_10_1007_s10519_021_10058_8
crossref_primary_10_1007_s40471_018_0135_2
crossref_primary_10_1002_gepi_22348
crossref_primary_10_1371_journal_pgen_1008081
crossref_primary_10_1002_cam4_2971
crossref_primary_10_1111_biom_13407
crossref_primary_10_1038_s41598_022_23451_y
crossref_primary_10_1002_sim_8446
crossref_primary_10_1038_s41380_019_0627_6
crossref_primary_10_1038_s41416_023_02312_z
crossref_primary_10_1194_jlr_P119000226
Cites_doi 10.1155/2015/143712
10.1002/gepi.21908
10.1214/aos/1015957397
10.1093/biostatistics/kxs014
10.1111/biom.12368
10.1017/CBO9780511801389
10.1093/biostatistics/kxt006
10.1093/biostatistics/kxs015
10.1016/j.ajhg.2011.07.007
10.1007/BF02985802
10.1159/000312643
10.1053/j.gastro.2012.12.020
10.1002/gepi.21735
10.1016/j.csda.2008.11.025
10.1016/j.cell.2007.10.052
10.1016/j.amjcard.2004.02.058
10.1002/gepi.21610
10.1038/nrg2764
ContentType Journal Article
Copyright The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. 2016
Copyright_xml – notice: The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
– notice: The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. 2016
CorporateAuthor Genetics and Epidemiology of Colorectal Cancer Consortium
CorporateAuthor_xml – name: Genetics and Epidemiology of Colorectal Cancer Consortium
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
5PM
DOI 10.1093/biostatistics/kxw034
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1468-4357
EndPage 131
ExternalDocumentID PMC5255050
27474101
10_1093_biostatistics_kxw034
Genre Journal Article
GrantInformation_xml – fundername: NCI NIH HHS
  grantid: R01 CA189532
– fundername: NCI NIH HHS
  grantid: P01 CA053996
– fundername: NCI NIH HHS
  grantid: R01 CA195789
GroupedDBID ---
-E4
.2P
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5VS
5WA
6PF
70D
AAIJN
AAJKP
AAJQQ
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAUQX
AAVAP
AAWTL
AAYXX
ABDFA
ABDTM
ABEJV
ABEUO
ABGNP
ABIXL
ABJNI
ABLJU
ABNKS
ABPQP
ABPTD
ABQLI
ABVGC
ABWST
ABXVV
ABZBJ
ACGFS
ACIPB
ACIWK
ACPRK
ACUFI
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADGZP
ADHKW
ADHZD
ADIPN
ADNBA
ADOCK
ADQBN
ADRDM
ADRTK
ADVEK
ADYJX
ADYVW
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGORE
AGQXC
AGSYK
AHGBF
AHMBA
AHXPO
AIJHB
AJBYB
AJEEA
AJEUX
AJNCP
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
ALXQX
ANAKG
APIBT
APWMN
ATGXG
AXUDD
AZVOD
BAWUL
BAYMD
BCRHZ
BEYMZ
BHONS
BQUQU
BTQHN
C1A
C45
CAG
CDBKE
CITATION
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
E3Z
EBD
EBS
EE~
EJD
EMOBN
F5P
F9B
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
KBUDW
KOP
KQ8
KSI
KSN
M-Z
N9A
NGC
NMDNZ
NOMLY
NTWIH
NU-
O0~
O9-
ODMLO
OJQWA
OJZSN
OK1
OVD
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
RD5
RIG
RNI
ROL
ROX
RUSNO
RW1
RXO
RZO
SV3
TEORI
TJP
TN5
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~91
CGR
CUY
CVF
ECM
EIF
NPM
5PM
ID FETCH-LOGICAL-c408t-1ac4ac7d9b2f412c4474182a32f6fa4dae665b307c926a6166399ff8ea9598393
ISSN 1465-4644
IngestDate Thu Aug 21 18:27:37 EDT 2025
Mon Jul 21 06:02:50 EDT 2025
Tue Jul 01 03:45:54 EDT 2025
Thu Apr 24 23:07:20 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Score test
Rare genetic variants
Burden and variance component tests
Colorectal cancer
Kernel machine
Language English
License The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c408t-1ac4ac7d9b2f412c4474182a32f6fa4dae665b307c926a6166399ff8ea9598393
OpenAccessLink https://academic.oup.com/biostatistics/article-pdf/18/1/119/9607760/kxw034.pdf
PMID 27474101
PageCount 13
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_5255050
pubmed_primary_27474101
crossref_citationtrail_10_1093_biostatistics_kxw034
crossref_primary_10_1093_biostatistics_kxw034
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2017-01-01
PublicationDateYYYYMMDD 2017-01-01
PublicationDate_xml – month: 01
  year: 2017
  text: 2017-01-01
  day: 01
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Biostatistics (Oxford, England)
PublicationTitleAlternate Biostatistics
PublicationYear 2017
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References 2017011904250978000_18.1.119.8
2017011904250978000_18.1.119.9
2017011904250978000_18.1.119.4
2017011904250978000_18.1.119.6
2017011904250978000_18.1.119.7
2017011904250978000_18.1.119.15
2017011904250978000_18.1.119.1
2017011904250978000_18.1.119.16
2017011904250978000_18.1.119.2
Hastie (2017011904250978000_18.1.119.5) 2005; 27
2017011904250978000_18.1.119.17
2017011904250978000_18.1.119.3
2017011904250978000_18.1.119.18
2017011904250978000_18.1.119.19
2017011904250978000_18.1.119.10
2017011904250978000_18.1.119.11
2017011904250978000_18.1.119.12
2017011904250978000_18.1.119.13
2017011904250978000_18.1.119.14
22699862 - Biostatistics. 2012 Sep;13(4):762-75
26273586 - Biomed Res Int. 2015;2015:143712
20606458 - Hum Hered. 2010;70(2):132-40
26229047 - Biometrics. 2016 Mar;72 (1):156-64
18160036 - Cell. 2007 Dec 28;131(7):1248-59
20212493 - Nat Rev Genet. 2010 Apr;11(4):259-72
23266556 - Gastroenterology. 2013 Apr;144(4):799-807.e24
23462021 - Biostatistics. 2013 Sep;14(4):667-81
22714933 - Genet Epidemiol. 2012 Apr;36(3):183-94
22734045 - Biostatistics. 2012 Sep;13(4):776-90
21835306 - Am J Hum Genet. 2011 Aug 12;89(2):277-88
23720162 - Genet Epidemiol. 2013 Jul;37(5):452-64
15194016 - Am J Cardiol. 2004 Jun 15;93(12):1473-80
26095235 - Genet Epidemiol. 2015 Dec;39(8):609-18
References_xml – ident: 2017011904250978000_18.1.119.3
  doi: 10.1155/2015/143712
– ident: 2017011904250978000_18.1.119.8
  doi: 10.1002/gepi.21908
– ident: 2017011904250978000_18.1.119.9
  doi: 10.1214/aos/1015957397
– ident: 2017011904250978000_18.1.119.10
  doi: 10.1093/biostatistics/kxs014
– ident: 2017011904250978000_18.1.119.12
  doi: 10.1111/biom.12368
– ident: 2017011904250978000_18.1.119.2
  doi: 10.1017/CBO9780511801389
– ident: 2017011904250978000_18.1.119.11
  doi: 10.1093/biostatistics/kxt006
– ident: 2017011904250978000_18.1.119.1
  doi: 10.1093/biostatistics/kxs015
– ident: 2017011904250978000_18.1.119.18
  doi: 10.1016/j.ajhg.2011.07.007
– ident: 2017011904250978000_18.1.119.4
– volume: 27
  start-page: 83
  year: 2005
  ident: 2017011904250978000_18.1.119.5
  article-title: The elements of statistical learning: data mining, inference and prediction
  publication-title: The Mathematical Intelligencer
  doi: 10.1007/BF02985802
– ident: 2017011904250978000_18.1.119.15
  doi: 10.1159/000312643
– ident: 2017011904250978000_18.1.119.14
  doi: 10.1053/j.gastro.2012.12.020
– ident: 2017011904250978000_18.1.119.7
  doi: 10.1002/gepi.21735
– ident: 2017011904250978000_18.1.119.13
  doi: 10.1016/j.csda.2008.11.025
– ident: 2017011904250978000_18.1.119.16
  doi: 10.1016/j.cell.2007.10.052
– ident: 2017011904250978000_18.1.119.19
  doi: 10.1016/j.amjcard.2004.02.058
– ident: 2017011904250978000_18.1.119.6
  doi: 10.1002/gepi.21610
– ident: 2017011904250978000_18.1.119.17
  doi: 10.1038/nrg2764
– reference: 20606458 - Hum Hered. 2010;70(2):132-40
– reference: 18160036 - Cell. 2007 Dec 28;131(7):1248-59
– reference: 20212493 - Nat Rev Genet. 2010 Apr;11(4):259-72
– reference: 22699862 - Biostatistics. 2012 Sep;13(4):762-75
– reference: 15194016 - Am J Cardiol. 2004 Jun 15;93(12):1473-80
– reference: 23462021 - Biostatistics. 2013 Sep;14(4):667-81
– reference: 22714933 - Genet Epidemiol. 2012 Apr;36(3):183-94
– reference: 26273586 - Biomed Res Int. 2015;2015:143712
– reference: 21835306 - Am J Hum Genet. 2011 Aug 12;89(2):277-88
– reference: 23266556 - Gastroenterology. 2013 Apr;144(4):799-807.e24
– reference: 22734045 - Biostatistics. 2012 Sep;13(4):776-90
– reference: 23720162 - Genet Epidemiol. 2013 Jul;37(5):452-64
– reference: 26229047 - Biometrics. 2016 Mar;72 (1):156-64
– reference: 26095235 - Genet Epidemiol. 2015 Dec;39(8):609-18
SSID ssj0022363
Score 2.2792056
Snippet SummaryThe development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation...
The development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly...
SourceID pubmedcentral
pubmed
crossref
SourceType Open Access Repository
Index Database
Enrichment Source
StartPage 119
SubjectTerms Gene-Environment Interaction
Humans
Models, Genetic
Models, Statistical
Sequence Analysis, DNA - statistics & numerical data
Title A unified powerful set-based test for sequencing data analysis of GxE interactions
URI https://www.ncbi.nlm.nih.gov/pubmed/27474101
https://pubmed.ncbi.nlm.nih.gov/PMC5255050
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELaWIiQuVXl2eckHbiu368Rx4uOqKhQQHEorVVxWjh80AmWrbiIWfj3j2HETqIByiXa9WSvJfJn5PJ4HQi95wbWmhhMwJ5wwbTSRjOVEU1WITGZClG5H9_0HfnTK3p5lZ5NJO8wuaco99ePavJL_kSqMgVxdluwNJBsnhQH4DPKFI0gYjv8k48WsrSvrSOSFa3bmwo3XpiHOMukZcMimCyIM0dLOJ-DiQWdyUIfk9eawqxhx6fMb1qM93mrl0o1CJWdXlnTTR8KH1h8DN8LHtlPmLTluIzeu_H7-qv5MPp1XEUDr1jsDhg4Hmg8cDl5HMp4Rxn3Zxj3TjxUEmFd-vWKNAPJakgYt6Q0u9WbgN13u61yVw3uF71823-bB_Tkqnv2LUYuhhn6TPV2O5ln6WW6h20me-939N-_iOj1JuwZ88T77jEuR7o9m2fezjBhNpDHjENsBZznZQdthsYEXHjn30MTU99Ed3370-wN0vMABP7jHD474wQ4_GASOr_CDHX5wjx-8shjwg4f4eYhOXx2eHByR0GODKDYvGkKlYlLlWpSJZTRR8J4yWHLKNLHcSqal4TwrwRAokXDJKXeM1trCSJEJINfpI7RVr2qzi7CwWQl8P9WJoi5oQ0pplSmkArMh5opNUdo_p6UKBehdH5Svyz_JaIpI_NeFL8Dyl_Mf-ycfz3Z-FwaWZ4rykUziCa7C-viXujrvKq1niVvAz5_c8BqeortXL84ztNVctuY5cNemfNEh7SeugqSD
linkProvider Flying Publisher
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+unified+powerful+set-based+test+for+sequencing+data+analysis+of+GxE+interactions&rft.jtitle=Biostatistics+%28Oxford%2C+England%29&rft.au=Su%2C+Yu-Ru&rft.au=Di%2C+Chong-Zhi&rft.au=Hsu%2C+Li&rft.date=2017-01-01&rft.issn=1465-4644&rft.eissn=1468-4357&rft.volume=18&rft.issue=1&rft.spage=119&rft.epage=131&rft_id=info:doi/10.1093%2Fbiostatistics%2Fkxw034&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_biostatistics_kxw034
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1465-4644&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1465-4644&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1465-4644&client=summon