Open Access
August 2015 Maximin effects in inhomogeneous large-scale data
Nicolai Meinshausen, Peter Bühlmann
Ann. Statist. 43(4): 1801-1830 (August 2015). DOI: 10.1214/15-AOS1325

Abstract

Large-scale data are often characterized by some degree of inhomogeneity as data are either recorded in different time regimes or taken from multiple sources. We look at regression models and the effect of randomly changing coefficients, where the change is either smoothly in time or some other dimension or even without any such structure. Fitting varying-coefficient models or mixture models can be appropriate solutions but are computationally very demanding and often return more information than necessary. If we just ask for a model estimator that shows good predictive properties for all regimes of the data, then we are aiming for a simple linear model that is reliable for all possible subsets of the data. We propose the concept of “maximin effects” and a suitable estimator and look at its prediction accuracy from a theoretical point of view in a mixture model with known or unknown group structure. Under certain circumstances the estimator can be computed orders of magnitudes faster than standard penalized regression estimators, making computations on large-scale data feasible. Empirical examples complement the novel methodology and theory.

Citation

Download Citation

Nicolai Meinshausen. Peter Bühlmann. "Maximin effects in inhomogeneous large-scale data." Ann. Statist. 43 (4) 1801 - 1830, August 2015. https://doi.org/10.1214/15-AOS1325

Information

Received: 1 June 2014; Revised: 1 November 2014; Published: August 2015
First available in Project Euclid: 17 June 2015

zbMATH: 1317.62059
MathSciNet: MR3357879
Digital Object Identifier: 10.1214/15-AOS1325

Subjects:
Primary: 62J07

Keywords: Aggregation , big data , Mixture models , regularization , robustness

Rights: Copyright © 2015 Institute of Mathematical Statistics

Vol.43 • No. 4 • August 2015
Back to Top