A computationally efficient procedure for combining ecological datasets by means of sequential consensus inference

Combining data has become an indispensable tool for managing the current diversity and abundance of data. But, as data complexity and data volume swell, the computational demands of previously proposed models for combining data escalate proportionally, posing a significant challenge to practical imp...

Full description

Saved in:

Bibliographic Details
Main Authors	Figueira, Mario, Conesa, David, López-Quílez, Antonio, Paradinas, Iosu
Format	Journal Article
Language	English
Published	12.06.2024
Subjects	Statistics - Computation Statistics - Methodology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Combining data has become an indispensable tool for managing the current diversity and abundance of data. But, as data complexity and data volume swell, the computational demands of previously proposed models for combining data escalate proportionally, posing a significant challenge to practical implementation. This study presents a sequential consensus Bayesian inference procedure that allows for a flexible definition of models, aiming to emulate the versatility of integrated models while significantly reducing their computational cost. The method is based on updating the distribution of the fixed effects and hyperparameters from their marginal posterior distribution throughout a sequential inference procedure, and performing a consensus on the random effects after the sequential inference is completed. The applicability, together with its strengths and limitations, is outlined in the methodological description of the procedure. The sequential consensus method is presented in two distinct algorithms. The first algorithm performs a sequential updating and consensus from the stored values of the marginal or joint posterior distribution of the random effects. The second algorithm performs an extra step, addressing the deficiencies that may arise when the model partition does not share the whole latent field. The performance of the procedure is shown by three different examples -- one simulated and two with real data -- intending to expose its strengths and limitations.
DOI:	10.48550/arxiv.2406.08174