Collision between biological process and statistical analysis revealed by mean centring

Animal ecologists often collect hierarchically structured data and analyse these with linear mixed‐effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g. within vs. among subjects). Mean centring of covariates within subjects offers a useful ap...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of animal ecology Vol. 89; no. 12; pp. 2813 - 2824
Main Authors Westneat, David F., Araya‐Ajoy, Yimen G., Allegue, Hassen, Class, Barbara, Dingemanse, Niels, Dochtermann, Ned A., Garamszegi, László Zsolt, Martin, Julien G. A., Nakagawa, Shinichi, Réale, Denis, Schielzeth, Holger, Phillimore, Albert
Format Journal Article
LanguageEnglish
Published England Blackwell Publishing Ltd 01.12.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Animal ecologists often collect hierarchically structured data and analyse these with linear mixed‐effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g. within vs. among subjects). Mean centring of covariates within subjects offers a useful approach in such situations, but is not without problems. A statistical model represents a hypothesis about the underlying biological process. Mean centring within clusters assumes that the lower level responses (e.g. within subjects) depend on the deviation from the subject mean (relative) rather than on the absolute scale of the covariate. This may or may not be biologically realistic. We show that mismatch between the nature of the generating (i.e. biological) process and the form of the statistical analysis produce major conceptual and operational challenges for empiricists. We explored the consequences of mismatches by simulating data with three response‐generating processes differing in the source of correlation between a covariate and the response. These data were then analysed by three different analysis equations. We asked how robustly different analysis equations estimate key parameters of interest and under which circumstances biases arise. Mismatches between generating and analytical equations created several intractable problems for estimating key parameters. The most widely misestimated parameter was the among‐subject variance in response. We found that no single analysis equation was robust in estimating all parameters generated by all equations. Importantly, even when response‐generating and analysis equations matched mathematically, bias in some parameters arose when sampling across the range of the covariate was limited. Our results have general implications for how we collect and analyse data. They also remind us more generally that conclusions from statistical analysis of data are conditional on a hypothesis, sometimes implicit, for the process(es) that generated the attributes we measure. We discuss strategies for real data analysis in face of uncertainty about the underlying biological process. Many biological processes produce hierarchical ecological data, such as different ways temperature may affect activity within and among individual green iguanas. Simulations were used to investigate how well statistical models using mean centring perform depending on whether they matched the underlying process. A variety of problems were found, including some arising from sampling even when models match. Potential solutions involve better integration of statistics and biology. Photo by D. F. Westneat.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0021-8790
1365-2656
1365-2656
DOI:10.1111/1365-2656.13360