Scalable estimation strategies based on stochastic approximations: Classical results and new insights

Stat Comput. 2015 Jul 1;25(4):781-795. doi: 10.1007/s11222-015-9560-y.

Abstract

Estimation with large amounts of data can be facilitated by stochastic gradient methods, in which model parameters are updated sequentially using small batches of data at each step. Here, we review early work and modern results that illustrate the statistical properties of these methods, including convergence rates, stability, and asymptotic bias and variance. We then overview modern applications where these methods are useful, ranging from an online version of the EM algorithm to deep learning. In light of these results, we argue that stochastic gradient methods are poised to become benchmark principled estimation procedures for large data sets, especially those in the family of stable proximal methods, such as implicit stochastic gradient descent.

Keywords: asymptotic analysis; big data; efficient estimation; exponential family; implicit stochastic gradient descent; maximum likelihood; optimal learning rate; recursive estimation; stochastic gradient descent methods.