Open Access
December2000 Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV
Fangyu Gao, Ronald Klein, Barbara Klein, Xiwu Lin, Grace Wahba, Dong Xiang
Ann. Statist. 28(6): 1570-1600 (December2000). DOI: 10.1214/aos/1015957471

Abstract

We propose the randomized Generalized Approximate Cross Validation (ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized likelihood variational problem which, in conjunction with the ranGACV method allows the application of smoothing spline ANOVA models with Bernoulli data to much larger data sets than previously possible. These methods are based on choosing an approximating subset of the natural (representer) basis functions for the variational problem. Simulation studies with synthetic data, including synthetic data mimicking demographic risk factor data sets is used to examine the properties of the method and to compare the approach with the GRKPACK code of Wang (1997c). Bayesian “confidence intervals” are obtained for the fits and are shown in the simulation studies to have the “across the function” property usually claimed for these confidence intervals. Finally the method is applied to an observational data set from the Beaver Dam Eye study, with scientifically interesting results.

Citation

Download Citation

Fangyu Gao. Ronald Klein. Barbara Klein. Xiwu Lin. Grace Wahba. Dong Xiang. "Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV." Ann. Statist. 28 (6) 1570 - 1600, December2000. https://doi.org/10.1214/aos/1015957471

Information

Published: December2000
First available in Project Euclid: 12 March 2002

zbMATH: 1105.62358
MathSciNet: MR1835032
Digital Object Identifier: 10.1214/aos/1015957471

Subjects:
Primary: 62A99 , 62G07 , 62J07 , 65D07 , 65D10 , 68T05 , 92C60
Secondary: 41A15 , 41A63 , 49M15 , 62G07 , 62M30 , 65D15 , 92H25

Keywords: Degrees of freedom , exponential families , Nonparametric regression , penalized likelihood , representers , reproducing kernel Hilbert spaces , risk factor estimation , Smoothing spline ANOVA

Rights: Copyright © 2000 Institute of Mathematical Statistics

Vol.28 • No. 6 • December2000
Back to Top