Open Access
December 2017 CoCoLasso for high-dimensional error-in-variables regression
Abhirup Datta, Hui Zou
Ann. Statist. 45(6): 2400-2426 (December 2017). DOI: 10.1214/16-AOS1527

Abstract

Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright [Ann. Statist. 40 (2012) 1637–1664] proposed a nonconvex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel calibrated cross-validation technique by using the basic idea in CoCoLasso. The calibrated cross-validation has its own importance. We demonstrate the superior performance of our method over the nonconvex approach by simulation studies.

Citation

Download Citation

Abhirup Datta. Hui Zou. "CoCoLasso for high-dimensional error-in-variables regression." Ann. Statist. 45 (6) 2400 - 2426, December 2017. https://doi.org/10.1214/16-AOS1527

Information

Received: 1 November 2015; Revised: 1 August 2016; Published: December 2017
First available in Project Euclid: 15 December 2017

zbMATH: 06838137
MathSciNet: MR3737896
Digital Object Identifier: 10.1214/16-AOS1527

Subjects:
Primary: 62J07
Secondary: 62F12

Keywords: Convex optimization , error in variables , high-dimensional regression , Lasso , missing data

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.45 • No. 6 • December 2017
Back to Top