Open Access
August 2017 Asymptotic and finite-sample properties of estimators based on stochastic gradients
Panos Toulis, Edoardo M. Airoldi
Ann. Statist. 45(4): 1694-1727 (August 2017). DOI: 10.1214/16-AOS1506


Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.


Download Citation

Panos Toulis. Edoardo M. Airoldi. "Asymptotic and finite-sample properties of estimators based on stochastic gradients." Ann. Statist. 45 (4) 1694 - 1727, August 2017.


Received: 1 September 2015; Revised: 1 August 2016; Published: August 2017
First available in Project Euclid: 28 June 2017

zbMATH: 1378.62046
MathSciNet: MR3670193
Digital Object Identifier: 10.1214/16-AOS1506

Primary: 62F10 , 62F12 , 62F35 , 62L12 , 62L20

Keywords: asymptotic variance , Cox proportional hazards , exponential family , generalized linear models , implicit updates , maximum likelihood , M-estimation , numerical stability , statistical efficiency , stochastic approximation

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.45 • No. 4 • August 2017
Back to Top