.. image:: pystat.png :height: 20 :alt: Statistique :target: http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/td_2a_notions.html#pour-un-profil-plutot-data-scientist .. _l-ml2a-sgd: Stochastique Gradient Descent +++++++++++++++++++++++++++++ Ou *descente de gradient stochastique* en français. (*à venir*) *Lectures* * `Adam: A Method for Stochastic Optimization `_ * `HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent `_ * `Sparse Online Learning via Truncated Gradient `_ * `Stabilized Sparse Online Learning for Sparse Data `_ * `On Perturbed Proximal Gradient Algorithms `_ * `Large Margin Classification Using the Perceptron Algorithm `_ * `Scaling Up Stochastic Dual Coordinate Ascent `_ * `Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization `_ * `Dual Principal Component Pursuit `_ * `Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification `_ * `Compact Convex Projections `_ * `Optimization Methods for Large-Scale Machine Learning `_ * `k-SVRG: Variance Reduction for Large Scale Optimization `_ * `Accelerating Stochastic Gradient Descent using Predictive Variance Reduction `_ * `Gradients without Backpropagation `_ * `Large Batch Optimization for Deep Learning: Training BERT in 76 minutes `_