Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression
Proportional data, in which response variables are expressed as percentages or fractions of a whole, are analysed in many subfields of ecology and evolution. The scale‐independence of proportions makes them appropriate to analyse many biological phenomena, but statistical analyses are not straightfo...
Saved in:
Published in | Methods in ecology and evolution Vol. 10; no. 9; pp. 1412 - 1430 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
London
John Wiley & Sons, Inc
01.09.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Proportional data, in which response variables are expressed as percentages or fractions of a whole, are analysed in many subfields of ecology and evolution. The scale‐independence of proportions makes them appropriate to analyse many biological phenomena, but statistical analyses are not straightforward, since proportions can only take values from zero to one and their variance is usually not constant across the range of the predictor. Transformations to overcome these problems are often applied, but can lead to biased estimates and difficulties in interpretation.
In this paper, we provide an overview of the different types of proportional data and discuss the different analysis strategies available. In particular, we review and discuss the use of promising, but little used, techniques for analysing continuous (also called non‐count‐based or non‐binomial) proportions (e.g. percent cover, fraction time spent on an activity): beta and Dirichlet regression, and some of their most important extensions.
A major distinction can be made between proportions arising from counts and those arising from continuous measurements. For proportions consisting of two categories, count‐based data are best analysed using well‐developed techniques such as logistic regression, while continuous proportions can be analysed with beta regression models. In the case of >2 categories, multinomial logistic regression or Dirichlet regression can be applied. Both beta and Dirichlet regression techniques model proportions at their original scale, which makes statistical inference more straightforward and produce less biased estimates relative to transformation‐based solutions. Extensions to beta regression, such as models for variable dispersion, zero‐one augmented data and mixed effects designs have been developed and are reviewed and applied to case studies. Finally, we briefly discuss some issues regarding model fitting, inference, and reporting that are particularly relevant to beta and Dirichlet regression.
Beta regression and Dirichlet regression overcome some problems inherent in applying classic statistical approaches to proportional data. To facilitate the adoption of these techniques by practitioners in ecology and evolution, we present detailed, annotated demonstration scripts covering all variations of beta and Dirichlet regression discussed in the article, implemented in the freely available language for statistical computing, r.
Foreign Language 抽象
在生态学和进化学的许多子领域中分析比例数据时,其中的响应变量被表示为整体的百分比或分数。比例相对于数据尺度的独立性使得其适用于分析许多生物学现象。但是由于比例只能从0到1取值,并且它们的方差在预测值的范围内通常不恒定,使得统计分析结果不具有直观性。为了克服上述问题,研究者通常采用数学变换等方法,但也可能导致有有偏估计和解释上的困难。
在本文中,我们概述了不同类型的比例数据,并讨论了现行的不同类型的分析方法,特别是一些用来分析连续(也称为非计数或非二项式)比例(例如,百分比覆盖,动物特定行为时间比例):β回归和Dirichlet回归,以及它们最重要的一些扩展。目前,虽然这些方法的使用范围较窄,但是我们认为它们有着广泛的应用前景。
可以对计数产生的比例和连续测量产生的比例进行区分。对于由2个类别组成的比例,若数据是计数的,可以使用例如逻辑回归等完善的方法进行分析,若数据是连续的,可以使用β回归模型。对于类别大于2的比例,可以使用多项逻辑回归或Dirichlet回归。β回归和Dirichlet回归均在数据的原始尺度上对比例建模,这不仅使得统计推断更加简单,并且相对数学变换的方法能产生更少的有偏估计。我们对β回归的扩展方法做了概述,例如变量扩散模型、0‐1增强数据和混合效应设计,并将这些方法应用于案例研究。最后,我们简要地讨论了与β回归和Dirichlet回归特别相关的模型拟合和统计推断。
β回归和Dirichlet回归克服了将经典统计方法应用于比例数据时固有的一些问题。为了帮助生态学和进化学研究者采用这些技术,我们提供了详细的、带注释的演示脚本,涵盖了文章中讨论的所有β回归和Dirichlet回归变体,可以用免费的统计学计算机语言R来实现。 |
---|---|
ISSN: | 2041-210X 2041-210X |
DOI: | 10.1111/2041-210X.13234 |