module `optim.sgd`#

Short summary#

module mlstatpy.optim.sgd

Implements simple stochastic gradient optimisation. It is inspired from _stochastic_optimizers.py.

Classes#

class	truncated documentation
`BaseOptimizer`	Base stochastic gradient descent optimizer.
`SGDOptimizer`	Stochastic gradient descent optimizer with momentum.

Methods#

method	truncated documentation
`__init__`
`__init__`
`_display_progress`	Displays training progress.
`_display_progress`	Displays training progress.
`_evaluate_early_stopping`
`_evaluate_early_stopping`
`_get_updates`
`_get_updates`	Gets the values used to update params with given gradients.
`_regularize_gradient`	Applies regularization.
`_regularize_gradient`	Applies regularization.
`iteration_ends`	Performs update to learning rate and potentially other states at the end of an iteration.
`iteration_ends`	Performs updates to learning rate and potential other states at the end of an iteration.
`loss_regularization`
`loss_regularization`
`train`	Optimizes the coefficients.
`train`	Optimizes the coefficients.
`update_coef`	Updates coefficients with given gradient.
`update_coef`	Updates coefficients with given gradient.

Documentation#

Implements simple stochastic gradient optimisation. It is inspired from _stochastic_optimizers.py.

source on GitHub

class mlstatpy.optim.sgd.BaseOptimizer(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#

Bases : object

Base stochastic gradient descent optimizer.

Paramètres:

coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights.
min_threshold – coefficients must be higher than min_thresold
max_threshold – coefficients must be below than max_thresold

The class holds the following attributes:

learning_rate: float, the current learning rate
coef: optimized coefficients
min_threshold, max_threshold: coefficients thresholds
l2: L2 regularization
l1: L1 regularization

source on GitHub

__init__(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#

_display_progress(it, max_iter, loss, losses=None, msg=None)#: Displays training progress.

_evaluate_early_stopping(it, max_iter, losses, early_th, verbose=False)#

_get_updates(grad)#

_regularize_gradient(grad)#

Applies regularization.

source on GitHub

iteration_ends(time_step)#

Performs update to learning rate and potentially other states at the end of an iteration.

source on GitHub

train(X, y, fct_loss, fct_grad, max_iter=100, early_th=None, verbose=False)#

Optimizes the coefficients.

Paramètres:

X – datasets (array)
y – expected target
fct_loss – loss function, signature: f(coef, X, y) -> float
fct_grad – gradient function, signature: g(coef, x, y, i) -> array
max_iter – number maximum of iteration
early_th – stops the training if the error goes below this threshold
verbose – display information

Renvoie:

loss

The method keeps the best coefficients for the minimal loss.

source on GitHub

update_coef(grad)#

Updates coefficients with given gradient.

Paramètres:: grad – array, gradient

source on GitHub

class mlstatpy.optim.sgd.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#

Bases : BaseOptimizer

Stochastic gradient descent optimizer with momentum.

Paramètres:

coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,
lr_schedule – {“constant”, “adaptive”, “invscaling”}, learning rate schedule for weight updates, “constant” for a constant learning rate given by learning_rate_init. “invscaling” gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), “adaptive”, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if “early_stopping” is on, the current learning rate is divided by 5.
momentum – float Value of momentum used, must be larger than or equal to 0
power_t – double The exponent for inverse scaling learning rate.
early_th – stops if the error goes below that threshold
min_threshold – lower bound for parameters (can be None)
max_threshold – upper bound for parameters (can be None)
l1 – L1 regularization
l2 – L2 regularization

The class holds the following attributes:

learning_rate: float, the current learning rate
velocity*: array, velocity that are used to update params

Stochastic Gradient Descent applied to linear regression

The following example how to optimize a simple linear regression.

<<<

import numpy
from mlstatpy.optim import SGDOptimizer


def fct_loss(c, X, y):
    return numpy.linalg.norm(X @ c - y) ** 2


def fct_grad(c, x, y, i=0):
    return x * (x @ c - y) * 0.1


coef = numpy.array([0.5, 0.6, -0.7])
X = numpy.random.randn(10, 3)
y = X @ coef

sgd = SGDOptimizer(numpy.random.randn(3))
sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True)
print('optimized coefficients:', sgd.coef)

>>>

    0/15: loss: 88.79 lr=0.1 max(coef): 1.9 l1=0/2.9 l2=0/4.5
    1/15: loss: 47.22 lr=0.0302 max(coef): 1.7 l1=0.38/2.4 l2=0.084/3.2
    2/15: loss: 39.7 lr=0.0218 max(coef): 1.4 l1=1.8/3 l2=2.5/3.3
    3/15: loss: 22.29 lr=0.018 max(coef): 0.92 l1=0.0063/2.5 l2=2.3e-05/2.1
    4/15: loss: 10.18 lr=0.0156 max(coef): 0.85 l1=0.01/1.9 l2=3.5e-05/1.4
    5/15: loss: 4.194 lr=0.014 max(coef): 0.76 l1=0.00065/1.6 l2=2.4e-07/1.1
    6/15: loss: 1.646 lr=0.0128 max(coef): 0.71 l1=0.065/1.8 l2=0.0018/1.1
    7/15: loss: 0.7677 lr=0.0119 max(coef): 0.66 l1=0.13/1.8 l2=0.0076/1.1
    8/15: loss: 0.4433 lr=0.0111 max(coef): 0.63 l1=0.095/1.8 l2=0.0042/1.1
    9/15: loss: 0.272 lr=0.0105 max(coef): 0.6 l1=0.00051/1.8 l2=8.8e-08/1.1
    10/15: loss: 0.1774 lr=0.00995 max(coef): 0.6 l1=0.00076/1.8 l2=2e-07/1
    11/15: loss: 0.1309 lr=0.00949 max(coef): 0.61 l1=0.001/1.8 l2=3.4e-07/1
    12/15: loss: 0.1065 lr=0.00909 max(coef): 0.61 l1=0.071/1.7 l2=0.0042/1
    13/15: loss: 0.08566 lr=0.00874 max(coef): 0.62 l1=0.0081/1.7 l2=3.2e-05/1
    14/15: loss: 0.07239 lr=0.00842 max(coef): 0.62 l1=0.06/1.7 l2=0.003/1
    15/15: loss: 0.06085 lr=0.00814 max(coef): 0.63 l1=0.056/1.7 l2=0.0025/1
    optimized coefficients: [ 0.541  0.577 -0.629]

source on GitHub

__init__(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#

_display_progress(it, max_iter, loss, losses=None, msg='loss')#: Displays training progress.

_get_updates(grad)#

Gets the values used to update params with given gradients.

Paramètres:: grad – array, gradient
Renvoie:: updates, array, the values to add to params

source on GitHub

iteration_ends(time_step)#

Performs updates to learning rate and potential other states at the end of an iteration.

Paramètres:: time_step – int number of training samples trained on so far, used to update learning rate for “invscaling”

source on GitHub

Liens

Contenu

Information

module `optim.sgd`#

Short summary#

Classes#

Methods#

Documentation#

Liens

Contenu

Information

module optim.sgd#

Short summary#

Classes#

Methods#

Documentation#

module `optim.sgd`#