Abstract
We consider the challenges that arise when fitting ecological individual heterogeneity models to “large” data sets. In particular, we focus on (continuous-valued) random effect models commonly used to describe individual heterogeneity present in ecological populations within the context of capture–recapture data, although the approach is more widely applicable to more general latent variable models. Within such models the associated likelihood is expressible only as an analytically intractable integral. Common techniques for fitting such models to data include, for example, the use of numerical approximations for the integral or a Bayesian data augmentation approach. However, as the size of the data set increases (i.e., the number of individuals increases), these computational tools may become computationally infeasible. We present an efficient Bayesian model-fitting approach, whereby we initially sample from the posterior distribution of a smaller subsample of the data, before correcting this sample to obtain estimates of the posterior distribution of the full data set using an importance sampling approach. We consider several practical issues, including the subsampling mechanism, computational efficiencies (including the ability to parallelise the algorithm) and combining subsampling estimates using multiple subsampled data sets. We initially demonstrate the feasibility (and accuracy) of the approach via simulated data before considering a challenging real data set of approximately 30,000 guillemots and, using the proposed algorithm, obtain posterior estimates of the model parameters in substantially reduced computational time, compared to the standard Bayesian model-fitting approach.
Funding Statement
RK was supported by the Leverhulme research fellowship RF-2019-299.
BS was supported by Margarita Salas fellowship from Ministry of Universities-University of Valencia (MS21-013).
VE was supported by the Agence Nationale de la Recherche of France (ANR-17-CE40-0031-01), the Leverhulme research fellowship (RF-2021-593) and by ARL/ARO (grants W911NF-20-1-0126 and W911NF-22-1-0235).
Acknowledgments
We thank the Baltic Seabird Project for making the data available and the large number of field workers and volunteers at Stora Karlsö. Field work on Stora Karlsö has been made possible through a long-term engagement in the Baltic Seabird Project by WWF Sweden. We would also like to thank the two reviewers and Associate Editor for their helpful and insightful feedback in relation to the initial submission of the paper, leading to an improved manuscript. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising from this submission.
Citation
Ruth King. Blanca Sarzo. Víctor Elvira. "When ecological individual heterogeneity models and large data collide: An importance sampling approach." Ann. Appl. Stat. 17 (4) 3112 - 3132, December 2023. https://doi.org/10.1214/23-AOAS1753
Information