## The Annals of Statistics

### Dimension reduction for the conditional mean in regressions with categorical predictors

#### Abstract

Consider the regression of a response Y on a vector of quantitative predictors $\X$ and a categorical predictor W. In this article we describe a first method for reducing the dimension of $\X$ without loss of information on the conditional mean $\mathrm{E}(Y|\X,W)$ and without requiring a prespecified parametric model. The method, which allows for, but does not require, parametric versions of the subpopulation mean functions $\mathrm{E}(Y|\X,W=w)$, includes a procedure for inference about the dimension of $\X$ after reduction. This work integrates previous studies on dimension reduction for the conditional mean $\mathrm{E}(Y|\X)$ in the absence of categorical predictors and dimension reduction for the full conditional distribution of $Y|(\X,W)$. The methodology we describe may be particularly useful for constructing low-dimensional summary plots to aid in model-building at the outset of an analysis. Our proposals provide an often parsimonious alternative to the standard technique of modeling with interaction terms to adapt a mean function for different subpopulations determined by the levels of W. Examples illustrating this and other aspects of the development are presented.

#### Article information

Source
Ann. Statist., Volume 31, Number 5 (2003), 1636-1668.

Dates
First available in Project Euclid: 9 October 2003

https://projecteuclid.org/euclid.aos/1065705121

Digital Object Identifier
doi:10.1214/aos/1065705121

Mathematical Reviews number (MathSciNet)
MR2012828

Zentralblatt MATH identifier
1042.62037

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62G09: Resampling methods 62H05: Characterization and structure theory

#### Citation

Li, Bing; Cook, R. Dennis; Chiaromonte, Francesca. Dimension reduction for the conditional mean in regressions with categorical predictors. Ann. Statist. 31 (2003), no. 5, 1636--1668. doi:10.1214/aos/1065705121. https://projecteuclid.org/euclid.aos/1065705121

#### References

• Bentler, P. M. and Xie, J. (2000). Corrections to test statistics in principal Hessian directions. Statist. Probab. Lett. 47 381--389.
• Bura, E. and Cook, R. D. (2001). Estimating the structural dimension of regressions via parametric inverse regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 393--410.
• Chiaromonte, F. and Cook, R. D. (2002). Sufficient dimension reduction and graphics in regression. Ann. Inst. Statist. Math. 54 768--795.
• Chiaromonte, F., Cook, R. D. and Li, B. (2002). Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 30 475--497.
• Cook, R. D. (1996). Graphics for regressions with a binary response. J. Amer. Statist. Assoc. 91 983--992.
• Cook, R. D. (1998a). Regression Graphics. Wiley, New York.
• Cook, R. D. (1998b). Principal Hessian directions revisited. J. Amer. Statist. Assoc. 93 84--100.
• Cook, R. D. and Lee, H. (1999). Dimension reduction in binary response regression. J. Amer. Statist. Assoc. 94 1187--1200.
• Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455--474.
• Cook, R. D. and Weisberg, S. (1991). Discussion of Sliced inverse regression for dimension reduction.'' J. Amer. Statist. Assoc. 86 28--33.
• Cook, R. D. and Weisberg, S. (1999). Applied Regression Including Computing and Graphics. Wiley, New York.
• Eaton, M. L. and Tyler, D. E. (1994). The asymptotic distribution of singular values with applications to canonical correlations and correspondence analysis. J. Multivariate Anal. 50 238--264.
• Fouladi, R. T. (1997). Type I error control of some covariance structure analysis technique under conditions of multivariate non-normality. Comput. Statist. Data Anal. 29 526--532.
• Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316--342.
• Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma. J. Amer. Statist. Assoc. 87 1025--1039.
• Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009--1052.
• Satterthwaite, F. E. (1941). Synthesis of variance. Psychometrika 6 309--316.
• Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C. and Johannes, R. S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Twelfth Annual Symposium on Computer Applications in Medical Care 261--265. IEEE Computer Society Press, New York.