The Annals of Applied Statistics

Density estimation for grouped data with application to line transect sampling

Woncheol Jang and Ji Meng Loh

Full-text: Open access


Line transect sampling is a method used to estimate wildlife populations, with the resulting data often grouped in intervals. Estimating the density from grouped data can be challenging. In this paper we propose a kernel density estimator of wildlife population density for such grouped data. Our method uses a combined cross-validation and smoothed bootstrap approach to select the optimal bandwidth for grouped data. Our simulation study shows that with the smoothing parameter selected with this method, the estimated density from grouped data matches the true density more closely than with other approaches. Using smoothed bootstrap, we also construct bias-adjusted confidence intervals for the value of the density at the boundary. We apply the proposed method to two grouped data sets, one from a wooden stake study where the true density is known, and the other from a survey of kangaroos in Australia.

Article information

Ann. Appl. Stat., Volume 4, Number 2 (2010), 893-915.

First available in Project Euclid: 3 August 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bandwidth selection grouped data kernel density estimator line transect sampling smoothed bootstrap


Jang, Woncheol; Loh, Ji Meng. Density estimation for grouped data with application to line transect sampling. Ann. Appl. Stat. 4 (2010), no. 2, 893--915. doi:10.1214/09-AOAS307.

Export citation


  • Barabesi, L. (2000). Local likelihood density estimation in line transect sampling. Environmetrics 11 413–422.
  • Barabesi, L., Greco, L. and Naddeo, S. (2002). Density estimation in line transect sampling with grouped data by local least squares. Environmetrics 13 167–176.
  • Bellhouse, D. R. and Stafford, J. E. (1999). Density estimation from complex surveys. Statist. Sinica 9 407–424.
  • Buckland, S. T. (1992). Fitting density functions with polynomials. J. Roy. Statist. Soc. Ser. C 41 63–76.
  • Buckland, S. T., Anderson, D. R., Burnham, K. P., Laake, J. L., Borchers, D. L. and Thomas, L. (2001). Introduction to Distance Sampling: Estimating Abundance of Biological Populations. Oxford Univ. Press, New York.
  • Burnham, K. P. and Anderson, D. R. (1976). Mathematical models for nonparametric inferences from line transect data. Biometrics 32 325–336.
  • Burnham, K. P., Anderson, D. R. and Laake, J. L. (1980). Estimation of density from the line transect sampling of biological populations. Wildlife Monograph 72 1–202.
  • Chen, S. X. (1996). A kernel estimate for the density of a biological population by using line transect sampling. J. Roy. Statist. Soc. Ser. C 45 135–150.
  • Chiu, S.-T. (1991). The effect of discretization error on bandwidth selection for kernel density estimation. Biometrika 78 436–441.
  • Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 2431–2461.
  • Faraway, J. J. and Jhun, M. (1990). Bootstrap choice of bandwidth for density estimation. J. Amer. Statist. Assoc. 85 1119–1122.
  • Hall, P. (1992). Effect of bias estimation on coverage accuracy of bootstrap confidence intervals for a probability density. Ann. Statist. 20 675–694.
  • Jang, W. and Loh, J. M. (2009). Supplement to “Density estimation for grouped data with application to line transect sampling.” DOI: 10.1214/09-AOAS307SUPP.
  • Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth seletion for density estimation. J. Amer. Statist. Assoc. 91 401–407.
  • Mack, Y. P. and Quang, P. X. (1998). Kernel methods in line and point transect sampling. Biometrics 54 606–619.
  • Marques, F. F. C. and Buckland, S. T. (2003). Incorporating covariates into standard line transect analyses. Biometrics 59 924–935.
  • Marron, J. and Wand, M. (1992). Exact mean integrated squared error. Ann. Statist. 20 712–736.
  • R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press, New York.
  • Southwell, C. and Weaver, K. (1993). Evaluation of analytical procedures for density estimation from line-transect data: Data grouping, data truncation and the unit of analysis. Wildl. Res. 20 433–444.
  • Stone, C. J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist. 12 1285–1297.
  • Taylor, C. C. (1989). Bootstrap choice of the smoothing parameter in kernel density estimation. Biometrika 76 705–712.
  • Thomas, L., Laake, J. L., Rexstad, E., Strindberg, S., Marques, F. F. C., Buckland, S. T., Borchers, D. L., Anderson, D. R., Burnham, K. P., Burt, M. L., Hedley, S. L., Pollard, J. H., Bishop, J. R. B. and Marques, T. A. (2009). Distance 6.0. Release x1. Research Unit for Wildlife Population Assessment, Univ. St. Andrews, UK. Available at
  • Wasserman, L. (2005). All of Nonparametric Statistics. Springer, New York.
  • Wu, X. and Perloff, J. M. (2007). GMM estimation of a maximum entropy distribution with interval data. J. Econometrics 138 532–546.

Supplemental materials