module optim.sgd
#
Short summary#
module mlstatpy.optim.sgd
Implements simple stochastic gradient optimisation. It is inspired from _stochastic_optimizers.py.
Classes#
class |
truncated documentation |
---|---|
Base stochastic gradient descent optimizer. |
|
Stochastic gradient descent optimizer with momentum. |
Methods#
method |
truncated documentation |
---|---|
Displays training progress. |
|
Displays training progress. |
|
|
|
Gets the values used to update params with given gradients. |
|
Applies regularization. |
|
|
Applies regularization. |
Performs update to learning rate and potentially other states at the end of an iteration. |
|
Performs updates to learning rate and potential other states at the end of an iteration. |
|
|
|
|
|
Optimizes the coefficients. |
|
|
Optimizes the coefficients. |
Updates coefficients with given gradient. |
|
|
Updates coefficients with given gradient. |
Documentation#
Implements simple stochastic gradient optimisation. It is inspired from _stochastic_optimizers.py.
- class mlstatpy.optim.sgd.BaseOptimizer(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#
Bases :
object
Base stochastic gradient descent optimizer.
- Paramètres:
coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights.
min_threshold – coefficients must be higher than min_thresold
max_threshold – coefficients must be below than max_thresold
The class holds the following attributes:
learning_rate: float, the current learning rate
coef: optimized coefficients
min_threshold, max_threshold: coefficients thresholds
l2: L2 regularization
l1: L1 regularization
- __init__(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#
- _display_progress(it, max_iter, loss, losses=None, msg=None)#
Displays training progress.
- _evaluate_early_stopping(it, max_iter, losses, early_th, verbose=False)#
- _get_updates(grad)#
- _regularize_gradient(grad)#
Applies regularization.
- iteration_ends(time_step)#
Performs update to learning rate and potentially other states at the end of an iteration.
- train(X, y, fct_loss, fct_grad, max_iter=100, early_th=None, verbose=False)#
Optimizes the coefficients.
- Paramètres:
X – datasets (array)
y – expected target
fct_loss – loss function, signature: f(coef, X, y) -> float
fct_grad – gradient function, signature: g(coef, x, y, i) -> array
max_iter – number maximum of iteration
early_th – stops the training if the error goes below this threshold
verbose – display information
- Renvoie:
loss
The method keeps the best coefficients for the minimal loss.
- update_coef(grad)#
Updates coefficients with given gradient.
- Paramètres:
grad – array, gradient
- class mlstatpy.optim.sgd.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#
Bases :
BaseOptimizer
Stochastic gradient descent optimizer with momentum.
- Paramètres:
coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,
lr_schedule – {“constant”, “adaptive”, “invscaling”}, learning rate schedule for weight updates, “constant” for a constant learning rate given by learning_rate_init. “invscaling” gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), “adaptive”, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if “early_stopping” is on, the current learning rate is divided by 5.
momentum – float Value of momentum used, must be larger than or equal to 0
power_t – double The exponent for inverse scaling learning rate.
early_th – stops if the error goes below that threshold
min_threshold – lower bound for parameters (can be None)
max_threshold – upper bound for parameters (can be None)
l1 – L1 regularization
l2 – L2 regularization
The class holds the following attributes:
learning_rate: float, the current learning rate
velocity*: array, velocity that are used to update params
Stochastic Gradient Descent applied to linear regression
The following example how to optimize a simple linear regression.
<<<
import numpy from mlstatpy.optim import SGDOptimizer def fct_loss(c, X, y): return numpy.linalg.norm(X @ c - y) ** 2 def fct_grad(c, x, y, i=0): return x * (x @ c - y) * 0.1 coef = numpy.array([0.5, 0.6, -0.7]) X = numpy.random.randn(10, 3) y = X @ coef sgd = SGDOptimizer(numpy.random.randn(3)) sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True) print('optimized coefficients:', sgd.coef)
>>>
0/15: loss: 36.2 lr=0.1 max(coef): 2.6 l1=0/4.1 l2=0/8.7 1/15: loss: 21.02 lr=0.0302 max(coef): 2.1 l1=0.014/4.1 l2=7.8e-05/6.8 2/15: loss: 11.01 lr=0.0218 max(coef): 1.6 l1=0.39/3.6 l2=0.092/4.7 3/15: loss: 6.01 lr=0.018 max(coef): 1.3 l1=0.18/3.1 l2=0.012/3.5 4/15: loss: 4.275 lr=0.0156 max(coef): 1.1 l1=0.085/2.9 l2=0.0034/3 5/15: loss: 3.469 lr=0.014 max(coef): 1.1 l1=0.027/2.8 l2=0.00031/2.8 6/15: loss: 2.85 lr=0.0128 max(coef): 1 l1=0.45/2.7 l2=0.078/2.6 7/15: loss: 2.211 lr=0.0119 max(coef): 1 l1=0.39/2.6 l2=0.059/2.3 8/15: loss: 1.553 lr=0.0111 max(coef): 0.93 l1=0.021/2.4 l2=0.00016/2 9/15: loss: 1.264 lr=0.0105 max(coef): 0.9 l1=0.025/2.3 l2=0.00034/1.9 10/15: loss: 1.051 lr=0.00995 max(coef): 0.87 l1=0.036/2.3 l2=0.00061/1.8 11/15: loss: 0.938 lr=0.00949 max(coef): 0.85 l1=0.017/2.2 l2=0.00012/1.7 12/15: loss: 0.8297 lr=0.00909 max(coef): 0.84 l1=0.033/2.2 l2=0.00053/1.7 13/15: loss: 0.7574 lr=0.00874 max(coef): 0.83 l1=0.021/2.2 l2=0.0002/1.6 14/15: loss: 0.6771 lr=0.00842 max(coef): 0.82 l1=0.051/2.2 l2=0.00093/1.6 15/15: loss: 0.6022 lr=0.00814 max(coef): 0.81 l1=0.03/2.1 l2=0.00042/1.5 optimized coefficients: [ 0.719 0.809 -0.606]
- __init__(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#
- _display_progress(it, max_iter, loss, losses=None, msg='loss')#
Displays training progress.
- _get_updates(grad)#
Gets the values used to update params with given gradients.
- Paramètres:
grad – array, gradient
- Renvoie:
updates, array, the values to add to params
- iteration_ends(time_step)#
Performs updates to learning rate and potential other states at the end of an iteration.
- Paramètres:
time_step – int number of training samples trained on so far, used to update learning rate for “invscaling”