Open Access
April 2013 Valid post-selection inference
Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, Linda Zhao
Ann. Statist. 41(2): 802-837 (April 2013). DOI: 10.1214/12-AOS1077

Abstract

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid “post-selection inference” by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing “simultaneity insurance” for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffé protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.

Citation

Download Citation

Richard Berk. Lawrence Brown. Andreas Buja. Kai Zhang. Linda Zhao. "Valid post-selection inference." Ann. Statist. 41 (2) 802 - 837, April 2013. https://doi.org/10.1214/12-AOS1077

Information

Published: April 2013
First available in Project Euclid: 29 May 2013

zbMATH: 1267.62080
MathSciNet: MR3099122
Digital Object Identifier: 10.1214/12-AOS1077

Subjects:
Primary: 62J05 , 62J15

Keywords: Family-wise error , high-dimensional inference , Linear regression , Model selection , multiple comparison , sphere packing

Rights: Copyright © 2013 Institute of Mathematical Statistics

Vol.41 • No. 2 • April 2013
Back to Top