Bayesian Analysis

Objective Bayesian estimation for the number of species

Kathryn Barger and John Bunge

Full-text: Open access


Objective priors have been used in Bayesian models for estimating the number of species in a population, but they have not been examined in depth. Here we derive the form of two objective priors, using Bernardo's reference method and Jeffreys' rule, based on the mixed-Poisson likelihood used in the single-abundance-sample species problem. These derivations are based on asymptotic results for estimates of integer-valued parameters. The factored form of these priors justifies the use of independent prior distributions for the parameter of interest (the number of species) and the nuisance parameters (of the stochastic abundance distribution). We find that the reference prior is preferable overall to the prior resulting from Jeffreys' rule. Although a comprehensive objective Bayesian approach can become analytically intractable for more complicated models, the essence of the approach can be upheld in practice. We analyze several datasets to show that the method can be implemented in practice and that it yields good results, comparable with current competing methods.

Article information

Bayesian Anal., Volume 5, Number 4 (2010), 765-785.

First available in Project Euclid: 19 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Jeffreys' prior mixed-Poisson noninformative prior reference prior species richness estimation


Barger, Kathryn; Bunge, John. Objective Bayesian estimation for the number of species. Bayesian Anal. 5 (2010), no. 4, 765--785. doi:10.1214/10-BA527.

Export citation


  • Barger, K. and Bunge, J. (2008). "Bayesian estimation of the number of species using noninformative priors." Biometrical Journal, 50: 1064–1076.
  • Behnke, A., Bunge, J., Barger, K., Breiner, H.-W., Alla, V., and Stoeck, T. (2006). "Microeukaryote community patterns along an O$_2$/H$_2$S gradient in a supersulfidic anoxic fjord (Framvaren, Norway)." Applied and Environmental Microbiology, 72: 3626–3636.
  • Berger, J., Bernardo, J., and Sun, D. (2008). "Reference Priors for Discrete Parameter Spaces." Technical Report, Duke University.
  • Bernardo, J. M. (1979). "Reference posterior distributions for Bayesian inference." Journal of the Royal Statistical Society Series B, 41: 113–147.
  • Bernardo, J. M. and Ramón, J. M. (1998). "An introduction to Bayesian reference analysis: Inference on the ratio of multinomial parameters." The Statistician, 47: 101–135.
  • Bernardo, J. M. and Smith, A. F. M. (2000). Bayesian Theory. New York: Wiley.
  • Boender, C. G. E. and Rinnooy Kan, A. H. G. (1987). "A multinomial Bayesian approach to the estimation of population and vocabulary size." Biometrika, 74: 849–856.
  • Böhning, D. and Schön, D. (2005). "Nonparametric maximum likelihood estimation of population size based on the counting distribution." Journal of the Royal Statistical Society Series C, 54: 721–737.
  • Bunge, J. and Barger, K. (2008). "Parametric models for estimating the number of classes." Biometrical Journal, 50: 971–982.
  • Chao, A. (1987). "Estimating the population size for capture-recapture data with unequal catchability." Biometrics, 43: 783–791.
  • Chao, A. and Bunge, J. (2002). "Estimating the number of species in a stochastic abundance model." Biometrics, 58: 531–539.
  • Chao, A. and Lee, S.-M. (1992). "Estimating the number of classes via sample coverage." Journal of the American Statistical Association, 87: 210–217.
  • Chao, A. and Shen, T. J. (2003). Program SPADE (Species Prediction And Diversity Estimation), Program and Users Guide published at
  • Efron, B. and Thisted, R. (1976). "Estimating the number of unseen species: How many words did Shakespeare know?" Biometrika, 63: 435–447.
  • Esty, W. W. (1986). "Estimation of the size of a coinage: A survey and comparison of methods." Numismatic Chronicle, 146: 185–215.
  • Farcomeni, A. and Tardella, L. (2010). "Reference Bayesian methods for recapture models with heterogeneity." Test, 19: 187–208.
  • Favaro, S., Lijoi, A., Mena, R., and Prünster, I. (2009). "Bayesian nonparametric inference for species variety with a two parameter Poisson-Dirichlet process prior." Journal of the Royal Statistical Society Series B, 71: 993–1008.
  • Ferguson, T. S. (1973). "A Bayesian analysis of some nonparametric problems." Annals of Statistics, 1: 209–230.
  • Fisher, R. A., Corbet, A. S., and Williams, C. B. (1943). "The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population." Journal of Animal Ecology, 12: 42–58.
  • George, E. I. and Robert, C. P. (1992). "Capture-recapture estimation via Gibbs sampling." Biometrika, 79: 677–683.
  • Gutiérrez-Peña, E. and Rueda, R. (2003). "Reference priors for exponential families." Journal of Statistical Planning and Inference, 110: 35–54.
  • Hill, B. M. (1979). "Posterior moments of the number of species in a finite population and the posterior probability of finding a new species." Journal of the American Statistical Association, 74: 668–673.
  • Hong, S.-H., Bunge, J., Jeon, S.-O., and Epstein, S. S. (2006). "Predicting microbial species richness." Proceedings of the National Academy of Sciences, 103: 117–122.
  • Irony, T. Z. and Singpurwalla, N. D. (1997). "Non-informative priors do not exist: A dialogue with José M. Bernardo." Journal of Statistical Planning and Inference, 65: 159–189.
  • Jeffreys, H. (1939/1961). Theory of Probability. Oxford: University Press, 1961 edition.
  • –- (1946). "An invariant form for the prior probability in estimation problems." Proceedings of the Royal Society of London Series A, 186: 453–461.
  • Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. (1998). "Markov Chain Monte Carlo in practice: A roundtable discussion." American Statistician, 52: 93–100.
  • Lewins, W. A. and Joanes, D. N. (1984). "Bayesian estimation of the number of species." Biometrics, 40: 323–328.
  • Lijoi, A., Mena, R., and Prünster, I. (2007). "Bayesian nonparametric estimation of the probability of discovering a new species." Biometrika, 94: 769–786.
  • Lindley, D. V. (1956). "On a measure of the information provided by an experiment." Annals of Mathematical Statistics, 27: 986–1005.
  • Lindsay, B. G. and Roeder, K. (1987). "A unified treatment of integer parameter models." Journal of the American Statistical Association, 82: 758–764.
  • Liseo, B. (1993). "Elimination of nuisance parameters with reference priors." Biometrika, 80: 295–304.
  • Lloyd, C. J., Yip, P. S. F., and Chan, K. S. (1999). "Estimating the number of faults: Efficiency of removal, recapture, and seeding." IEEE Transactions on Reliability, 48: 369–376.
  • Madigan, D. and York, J. C. (1997). "Bayesian methods for estimation of the size of a closed population." Biometrika, 84: 19–31.
  • Mao, C. X. and Lindsay, B. G. (2007). "Estimating the number of classes." Annals of Statistics, 35: 917–930.
  • Quince, C., Curtis, T. P., and Sloan, W. T. (2008). "The rational exploration of microbial diversity." International Society for Microbial Ecology Journal, 2: 997–1006.
  • Raftery, A. E. (1987). "Inference and prediction for a general order statistic model with unknown population size." Journal of the American Statistical Association, 82: 1163–1168.
  • –- (1988). "Inference for the binomial N parameter: A hierarchical Bayes approach." Biometrika, 75: 223–228.
  • Rissanen, J. (1983). "A universal prior for integers and estimation by minimum description length." Annals of Statistics, 11: 416–431.
  • Rodrigues, J., Milan, L. A., and Leite, J. G. (2001). "Hierarchical Bayesian estimation for the number of species." Biometrical Journal, 43: 737–746.
  • Sanathanan, L. (1972). "Estimating the size of a multinomial population." Annals of Mathematical Statistics, 43: 142–152.
  • Sandland, R. L. and Cormack, R. M. (1984). "Statistical inference for Poisson and multinomial models for capture-recapture experiments." Biometrika, 71: 27–33.
  • Sichel, H. S. (1997). "Modelling species-abundance frequencies and species-individual functions with the generalized inverse Gaussian-Poisson distribution." South African Statistical Journal, 31: 13–37.
  • Smith, P. J. (1991). "Bayesian analyses for a multiple capture-recapture model." Biometrika, 78: 399–407.
  • Stoeck, T., Kasper, J., Bunge, J., Leslin, C., Ilyin, V., and Epstein, S. (2007). "Protistan diversity in the arctic: A case of paleoclimate shaping modern biodiversity?" Public Library of Science ONE, 2: e728.
  • Tardella, L. (2002). "A new Bayesian method for nonparametric capture-recapture models in presence of heterogeneity." Biometrika, 89: 807–817.
  • Wang, J. and Lindsay, B. G. (2005). "A penalized nonparametric maximum likelihood approach to species richness estimation." Journal of the American Statistical Association, 100: 942–959.
  • Wang, J. P. (2010). "Estimating species richness by a Poisson-compound gamma model." Biometrika, 97: 727–740.
  • Wang, X., He, C. Z., and Sun, D. (2007). "Bayesian population estimation for small sample capture-recapture data using noninformative priors." Journal of Statistical Planning and Inference, 137: 1099–1118.
  • Zhang, H. and Stern, H. (2005). "Investigation of a generalized multinomial model for species data." Journal of Statistical Computation and Simulation, 75: 347–362.
  • –- (2009). "Sample size calculation for finding unseen species." Bayesian Analysis, 4: 763–792.