com.microsoft - AdamOptimizer#

AdamOptimizer - 1 (com.microsoft)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

alpha: Coefficient of previous gradient in running average. Default value is ?.
beta: Coefficient of previous squared gradient in running average.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is ?.
do_bias_correction: Compute unbiased 1st and 2nd momentums. Default value is ?.
epsilon: Small scalar to avoid dividing by zero. Default value is ?.
lambda: Regularization coefficient of 0.5 * lambda * ||X||_2^2. Default to 0, which means no regularization. Default value is ?.
max_norm_clip: clip threshold of gradients. Default value is ?.
weight_decay_mode: Modes for applying weight decay, 0 means applying decay before weight update, 1 means applying decay after weight update. Default value is ?.

Inputs

Between 6 and 10 inputs.

R (heterogeneous) - T1: The initial learning rate.
T (heterogeneous) - T2: The update count of “X”. It should be a scalar.
weights (heterogeneous) - T3: weights to optimize.
gradients (heterogeneous) - T_GRAD: gradients computed in this iteration.
moment_1 (heterogeneous) - T4: exponentially averaged historical gradients.
moment_2 (heterogeneous) - T4: exponentially averaged historical squared gradients.
mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: FP16 or BFloat16 weights to optimize.
loss_scale (optional, heterogeneous) - T3: loss scale for mixed precision training
global_gradient_norm (optional, heterogeneous) - T_GRAD_NORM: Global gradient norm.
update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight tensors should be updated.

Outputs

Between 3 and 6 outputs.

new_T (heterogeneous) - T2: New update count.
new_moment_1 (heterogeneous) - T4: New averaged gradients.
new_moment_2 (heterogeneous) - T4: New averaged squared gradients.
new_weights (optional, heterogeneous) - T3: New weights.
new_gradients (optional, heterogeneous) - T_GRAD: New gradients.
new_mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: New FP16 or BFloat16 weights

Examples