Open Access
2015 Data enriched linear regression
Aiyou Chen, Art B. Owen, Minghui Shi
Electron. J. Statist. 9(1): 1078-1112 (2015). DOI: 10.1214/15-EJS1027

Abstract

We present a linear regression method for predictions on a small data set making use of a second possibly biased data set that may be much larger. Our method fits linear regressions to the two data sets while penalizing the difference between predictions made by those two models. The resulting algorithm is a shrinkage method similar to those used in small area estimation. We find a Stein-type result for Gaussian responses: when the model has $5$ or more coefficients and $10$ or more error degrees of freedom, it becomes inadmissible to use only the small data set, no matter how large the bias is. We also present both plug-in and AICc-based methods to tune our penalty parameter. Most of our results use an $L_{2}$ penalty, but we obtain formulas for $L_{1}$ penalized estimates when the model is specialized to the location setting. Ordinary Stein shrinkage provides an inadmissibility result for only $3$ or more coefficients, but we find that our shrinkage method typically produces much lower squared errors in as few as $5$ or $10$ dimensions when the bias is small and essentially equivalent squared errors when the bias is large.

Citation

Download Citation

Aiyou Chen. Art B. Owen. Minghui Shi. "Data enriched linear regression." Electron. J. Statist. 9 (1) 1078 - 1112, 2015. https://doi.org/10.1214/15-EJS1027

Information

Received: 1 November 2014; Published: 2015
First available in Project Euclid: 27 May 2015

zbMATH: 1328.62457
MathSciNet: MR3352068
Digital Object Identifier: 10.1214/15-EJS1027

Subjects:
Primary: 62D05 , 62J07
Secondary: 62F12

Keywords: data fusion , small area estimation , Stein shrinkage , transfer learning

Rights: Copyright © 2015 The Institute of Mathematical Statistics and the Bernoulli Society

Vol.9 • No. 1 • 2015
Back to Top