The Annals of Statistics

Nonparametric modal regression

Yen-Chi Chen, Christopher R. Genovese, Ryan J. Tibshirani, and Larry Wasserman

Full-text: Open access


Modal regression estimates the local modes of the distribution of $Y$ given $X=x$, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods. We study a simple nonparametric method for modal regression, based on a kernel density estimate (KDE) of the joint distribution of $Y$ and $X$. We derive asymptotic error bounds for this method, and propose techniques for constructing confidence sets and prediction sets. The latter is used to select the smoothing bandwidth of the underlying KDE. The idea behind modal regression is connected to many others, such as mixture regression and density ridge estimation, and we discuss these ties as well.

Article information

Ann. Statist., Volume 44, Number 2 (2016), 489-514.

Received: December 2014
Revised: August 2015
First available in Project Euclid: 17 March 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62G20: Asymptotic properties 62G05: Estimation

Nonparametric regression modes mixture model confidence set prediction set bootstrap


Chen, Yen-Chi; Genovese, Christopher R.; Tibshirani, Ryan J.; Wasserman, Larry. Nonparametric modal regression. Ann. Statist. 44 (2016), no. 2, 489--514. doi:10.1214/15-AOS1373.

Export citation


  • Arias-Castro, E., Mason, D. and Pelletier, B. (2013). On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Unpublished Manuscript.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, New York.
  • Carreira-Perpiñán, M. Á. (2007). Gaussian mean-shift is an em algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29 0767–0776.
  • Chaganty, A. T. and Liang, P. (2013). Spectral experts for estimating mixtures of linear regressions. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) 1040–1048. ACM, New York.
  • Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2014a). Enhanced mode clustering. Available at arXiv:1406.1780.
  • Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2014b). Generalized mode and ridge estimation. Available at arXiv:1406.1803.
  • Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896–1928.
  • Chen, Y.-C., Genovese, C. R., Tibshirani, R. J. and Wasserman, L. (2015). Supplement to “Nonparametric modal regression.” DOI:10.1214/15-AOS1373SUPP.
  • Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17 790–799.
  • Chernozhukov, V., Chetverikov, D. and Kato, K. (2014a). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787–1818.
  • Chernozhukov, V., Chetverikov, D. and Kato, K. (2014b). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564–1597.
  • Comaniciu, D. and Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24 603–619.
  • Eberly, D. (1996). Ridges in Image and Data Analysis. Springer, Berlin.
  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
  • Einbeck, J. and Tutz, G. (2006). Modelling beyond regression functions: An application of multimodal regression to speed-flow data. J. Roy. Statist. Soc. Ser. C 55 461–475.
  • Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 33 1380–1403.
  • Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2014). Nonparametric ridge estimation. Ann. Statist. 42 1511–1545.
  • Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré Probab. Stat. 38 907–921.
  • Huang, M., Li, R. and Wang, S. (2013). Nonparametric mixture of regression models. J. Amer. Statist. Assoc. 108 929–941.
  • Huang, M. and Yao, W. (2012). Mixture of regression models with varying mixing proportions: A semiparametric approach. J. Amer. Statist. Assoc. 107 711–724.
  • Hunter, D. R. and Young, D. S. (2012). Semiparametric mixtures of regressions. J. Nonparametr. Stat. 24 19–38.
  • Hyndman, R. J., Bashtannyk, D. M. and Grunwald, G. K. (1996). Estimating and visualizing conditional densities. J. Comput. Graph. Statist. 5 315–336.
  • Jacobs, R. A., Jordan, M. I., Nowlan, S. J. and Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Comput. 3 79–87. ISSN 0899-7667. Available at
  • Jiang, W. and Tanner, M. A. (1999). Hierarchical mixtures-of-experts for exponential family regression models: Approximation and maximum likelihood estimation. Ann. Statist. 27 987–1011.
  • Khalili, A. and Chen, J. (2007). Variable selection in finite mixture of regression models. J. Amer. Statist. Assoc. 102 1025–1038.
  • Lee, M.-j. (1989). Mode regression. J. Econometrics 42 337–349.
  • Li, J., Ray, S. and Lindsay, B. G. (2007). A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8 1687–1723.
  • Rojas, A. (2005). Nonparametric mixture regression. Ph.D. thesis, Carnegie Mellon Univ., Pittsburgh, PA.
  • Romano, J. P. (1988). On weak convergence and optimality of kernel density estimates of the mode. Ann. Statist. 16 629–647.
  • Sager, T. W. and Thisted, R. A. (1982). Maximum likelihood estimation of isotonic modal regression. Ann. Statist. 10 690–707.
  • Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York.
  • Viele, K. and Tong, B. (2002). Modeling with mixtures of linear regressions. Stat. Comput. 12 315–330.
  • Yao, W. (2013). A note on EM algorithm for mixture models. Statist. Probab. Lett. 83 519–526.
  • Yao, W. and Li, L. (2014). A new regression model: Modal linear regression. Scand. J. Stat. 41 656–671.
  • Yao, W. and Lindsay, B. G. (2009). Bayesian mixture labeling by highest posterior density. J. Amer. Statist. Assoc. 104 758–767.
  • Yao, W., Lindsay, B. G. and Li, R. (2012). Local modal regression. J. Nonparametr. Stat. 24 647–663.

Supplemental materials