Open Access
December 1995 Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture
Grace Wahba, Yuedong Wang, Chong Gu, Ronald Klein, Barbara Klein
Ann. Statist. 23(6): 1865-1895 (December 1995). DOI: 10.1214/aos/1034713638

Abstract

Let $y_i, i = 1, \dots, n$, be independent observations with the density of $y_i$ of the form $h(y_i, f_i) = \exp{y_i f_i - b(f_i) + c(y_i)]$, where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let $f_i = f(t(i))$, where $t = (t_1, \dots, t_d) \epsilon \mathsf{T}^{(1)} \otimes \dots \otimes \mathsf{T}^{(d)} = \mathsf{T}$, the $\mathsf{T}^{(\alpha)}$ are measurable spaces of rather general form and f is an unknown function on $\mathsf{T}$ with some assumed "smoothness" properties. Given ${y_i, t(i), i = 1, \dots, n}$, it is desired to estimate $f(t)$ for t in some region of interest contained in $\mathsf{T}$. We develop the fitting of smoothing spline ANOVA models to this data of the form $f(t) = C + \sum_{\alpha} f_{\alpha}(t_{\alpha}) + \sum_{\alpha < \beta} f_{\alpha \beta} (t_{\alpha}, t_{\beta}) + \dots$. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer, in an appropriate function space, of $\mathsf{L}(y, f) + \sum_{\alpha} \lambda_{\alpha} J_{\alpha}(f_{\alpha}) + \sum_{\alpha <\beta} \lambda_{\alpha \beta} J_{\alpha \beta}(f_{\alpha \beta}) + \dots$, where $\mathsf{L}(y, f)$ is the negative log likelihood of $y = (y_1, \dots, y_n)'$ given f, the $J_{\alpha}, J_{\alpha \beta}, \dots$ are quadratic penalty functionals and the ANOVA decomposition is terminated in some manner. There are five major parts required to turn this program into a practical data analysis tool: (1) methods for deciding which terms in the ANOVA decomposition to include (model selection), (2) methods for choosing good values of the smoothing parameters $\lambda_{\alpha}, \lambda_{\alpha \beta}, \dots$, (3) methods for making confidence statements concerning the estimate, (4) numerical algorithms for the calculations and, finally, (5) public software. In this paper we carry out this program, relying on earlier work and filling in important gaps. The overall scheme is applied to Bernoulli data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy to model the risk of progression of diabetic retinopathy as a function of glycosylated hemoglobin, duration of diabetes and body mass index. It is believed that the results have wide practical application to the analysis of data from large epidemiological studies.

Citation

Download Citation

Grace Wahba. Yuedong Wang. Chong Gu. Ronald Klein. Barbara Klein. "Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture." Ann. Statist. 23 (6) 1865 - 1895, December 1995. https://doi.org/10.1214/aos/1034713638

Information

Published: December 1995
First available in Project Euclid: 15 October 2002

zbMATH: 0854.62042
MathSciNet: MR1389856
Digital Object Identifier: 10.1214/aos/1034713638

Subjects:
Primary: 62A99 , 62G07 , 62J07 , 65D07 , 65D10 , 68T05 , 92C60
Secondary: 41A15 , 41A63 , 49M15 , 62M30 , 65D15 , 92H25

Keywords: exponential families , Nonparametric regression , risk factor estimation , Smoothing spline ANOVA

Rights: Copyright © 1995 Institute of Mathematical Statistics

Vol.23 • No. 6 • December 1995
Back to Top