Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) suffer from collinea...

Full description

Saved in:
Bibliographic Details
Published inJournal of agricultural, biological, and environmental statistics Vol. 20; no. 1; pp. 100 - 120
Main Authors Carrico, Caroline, Gennings, Chris, Wheeler, David C., Factor-Litvak, Pam
Format Journal Article
LanguageEnglish
Published Boston Springer Science+Business Media, LLC 01.03.2015
Springer US
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) suffer from collinearity and variance inflation, and shrinkage methods have limitations in selecting among correlated components. We propose a weighted quantile sum (WQS) approach to estimating a body burden index, which identifies "bad actors" in a set of highly correlated environmental chemicals. We evaluate and characterize the accuracy of WQS regression in variable selection through extensive simulation studies through sensitivity and specificity (i.e., ability of the WQS method to select the bad actors correctly and not incorrect ones). We demonstrate the improvement in accuracy this method provides over traditional ordinary regression and shrinkage methods (lasso, adaptive lasso, and elastic net). Results from simulations demonstrate that WQS regression is accurate under some environmentally relevant conditions, but its accuracy decreases for a fixed correlation pattern as the association with a response variable diminishes. Nonzero weights (i.e., weights exceeding a selection threshold parameter) may be used to identify bad actors; however, components within a cluster of highly correlated active components tend to have lower weights, with the sum of their weights representative of the set. Supplementary materials accompanying this paper appear on-line.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1085-7117
1537-2693
DOI:10.1007/s13253-014-0180-3