The Annals of Statistics

Conditional density estimation in a regression setting

Sam Efromovich

Full-text: Open access

Abstract

Regression problems are traditionally analyzed via univariate characteristics like the regression function, scale function and marginal density of regression errors. These characteristics are useful and informative whenever the association between the predictor and the response is relatively simple. More detailed information about the association can be provided by the conditional density of the response given the predictor. For the first time in the literature, this article develops the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities. The study of fixed design regression is of special interest and novelty because the known literature is devoted to the case of random predictors. For the aforementioned models, the paper suggests a universal adaptive estimator which (i) matches performance of an oracle that knows both an underlying model and an estimated conditional density; (ii) is sharp minimax over a vast class of anisotropic conditional densities; (iii) is at least rate minimax when the response is independent of the predictor and thus a bivariate conditional density becomes a univariate density; (iv) is adaptive to an underlying design (fixed or random) of predictors.

Article information

Source
Ann. Statist., Volume 35, Number 6 (2007), 2504-2535.

Dates
First available in Project Euclid: 22 January 2008

Permanent link to this document
https://projecteuclid.org/euclid.aos/1201012970

Digital Object Identifier
doi:10.1214/009053607000000253

Mathematical Reviews number (MathSciNet)
MR2382656

Zentralblatt MATH identifier
1129.62025

Subjects
Primary: 62G07: Density estimation
Secondary: 62C05: General considerations 62E20: Asymptotic distribution theory

Keywords
Adaptation parametric analytic and Sobolev densities anisotropic class finite and infinite support fixed and random designs lower bound MISE oracle inequality waste water treatment

Citation

Efromovich, Sam. Conditional density estimation in a regression setting. Ann. Statist. 35 (2007), no. 6, 2504--2535. doi:10.1214/009053607000000253. https://projecteuclid.org/euclid.aos/1201012970


Export citation

References

  • Arnold, B. C., Castillo, E. and Sarabia, J. M. (1999). Conditional Specification of Statistical Models. Springer, New York.
  • Bashtannyk, D. M. and Hyndman, R. J. (2001). Bandwidth selection for kernel conditional density estimation. Comput. Statist. Data Anal. 36 279–298.
  • Efromovich, S. (1985). Nonparametric estimation of a density with unknown smoothness. Theory Probab. Appl. 30 557–568.
  • Efromovich, S. (1989). On sequential nonparametric estimation of a density. Theory Probab. Appl. 34 228–239.
  • Efromovich, S. (1999). Nonparametric Curve Estimation: Methods, Theory and Applications. Springer, New York.
  • Efromovich, S. (2000). On sharp adaptive estimation of multivariate curves. Math. Methods Statist. 9 117–139.
  • Efromovich, S. (2001). Density estimation under random censorship and order restrictions: From asymptotic to small samples. J. Amer. Statist. Assoc. 96 667–684.
  • Efromovich, S. (2005). Estimation of the density of regression errors. Ann. Statist. 33 2194–2227.
  • Efromovich, S. (2005). Conditional density estimation in a regression setting: Small sample sizes and proofs. Technical report, Univ. New Mexico.
  • Efromovich, S. (2006). Dimension reduction, optimality and oracle approach in conditional density estimation. Technical report, Univ. Texas at Dallas.
  • Efromovich, S. (2007). Sequential design and estimation in heteroscedastic nonparametric regression; with discussion. Sequential Anal. 26 3–25.
  • Efromovich, S. and Pinsker, M. S. (1982). Estimation of a square integrable probability density of a random variable. Problems Inform. Transmission 18 175–189.
  • Efromovich, S. and Pinsker M. (1996). Sharp-optimal and adaptive estimation for heteroscedastic nonparametric regression. Statist. Sinica 6 925–942.
  • Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87 998–1004.
  • Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York.
  • Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83 189–206.
  • Fan, J. and Yim, T. H. (2004). A cross-validation method for estimating conditional densities. Biometrika 91 819–834.
  • Golubev, G. K. (1991). Local asymptotic normality in problems of nonparametric estimation of functions, and lower bounds for quadratic risks. Theory Probab. Appl. 36 152–157.
  • Golubev, G. K. (1992). Nonparametric estimation of smooth densities of a distribution in $L_2$. Problems Inform. Transmission 28 44–54.
  • Golubev, G. K. and Levit, B. Y. (1996). Asymptotically efficient estimation for analytic distributions. Math. Methods Statist. 5 357–368.
  • Hall, P., Racine, J. and Li, Q. (2004). Cross-validation and the estimation of conditional probability densities. J. Amer. Statist. Assoc. 99 1015–1026.
  • Hall, P., Wolff, R. C. L. and Yao, Q. (1999). Methods for estimating a conditional distribution function. J. Amer. Statist. Assoc. 94 154–163.
  • Hall, P. and Yao, Q. (2005). Approximating conditional distribution functions using dimension reduction. Ann. Statist. 33 1404–1421.
  • Hasminskii, R. and Ibragimov, I. (1990). On density estimation in the view of Kolmogorov's ideas in approximation theory. Ann. Statist. 18 999–1010.
  • Hoffmann, M. and Lepski, O. (2002). Random rates in anisotropic regression (with discussion). Ann. Statist. 30 325–396.
  • Hyndman, R. J., Bashtannyk, D. M. and Grunwald, G. K. (1996). Estimating and visualizing conditional densities. J. Comput. Graph. Statist. 5 315–336.
  • Hyndman, R. J. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for conditional density functions. J. Nonparametr. Statist. 14 259–278.
  • Ibragimov, I. A. and Hasminskii, R. Z. (1983). Estimation of distribution density belonging to a class of entire functions. Theory Probab. Appl. 27 551–562.
  • Kahane, J.-P. (1985). Some Random Series of Functions, 2nd ed. Cambridge Univ. Press.
  • Kawata, T. (1972). Fourier Analysis in Probability Theory. Academic Press, New York.
  • Neter, J., Kutner, M., Nachtsheim, C. and Wasserman, W. (1996). Applied Linear Statistical Models, 4th ed. McGraw-Hill, Boston.
  • Nikolskii, S. M. (1975). Approximation of Functions of Several Variables and Imbedding Theorems. Springer, New York.
  • Pinsker, M. S. (1980). Optimal filtering of square integrable signals in Gaussian white noise. Problems Inform. Transmission 16 52–68.
  • Prakasa Rao, B. L. S. (1983). Nonparametric Functional Estimation. Academic Press, New York.
  • Rosenblatt, M. (1969). Conditional probability density and regression estimators. In Multivariate Analysis II (P. R. Krishnaiah, ed.) 25–31. Academic Press, New York.
  • Schipper, M. (1996). Optimal rates and constants in $L_2$-minimax estimation of probability density functions. Math. Methods Statist. 5 253–274.