com.microsoft - AdamOptimizer#

AdamOptimizer - 1 (com.microsoft)#

Version

  • name: AdamOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Coefficient of previous gradient in running average. Default value is ?.

  • beta: Coefficient of previous squared gradient in running average.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is ?.

  • do_bias_correction: Compute unbiased 1st and 2nd momentums. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • lambda: Regularization coefficient of 0.5 * lambda * ||X||_2^2. Default to 0, which means no regularization. Default value is ?.

  • max_norm_clip: clip threshold of gradients. Default value is ?.

  • weight_decay_mode: Modes for applying weight decay, 0 means applying decay before weight update, 1 means applying decay after weight update. Default value is ?.

Inputs

Between 6 and 10 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • weights (heterogeneous) - T3: weights to optimize.

  • gradients (heterogeneous) - T_GRAD: gradients computed in this iteration.

  • moment_1 (heterogeneous) - T4: exponentially averaged historical gradients.

  • moment_2 (heterogeneous) - T4: exponentially averaged historical squared gradients.

  • mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: FP16 or BFloat16 weights to optimize.

  • loss_scale (optional, heterogeneous) - T3: loss scale for mixed precision training

  • global_gradient_norm (optional, heterogeneous) - T_GRAD_NORM: Global gradient norm.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight tensors should be updated.

Outputs

Between 3 and 6 outputs.

  • new_T (heterogeneous) - T2: New update count.

  • new_moment_1 (heterogeneous) - T4: New averaged gradients.

  • new_moment_2 (heterogeneous) - T4: New averaged squared gradients.

  • new_weights (optional, heterogeneous) - T3: New weights.

  • new_gradients (optional, heterogeneous) - T_GRAD: New gradients.

  • new_mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: New FP16 or BFloat16 weights

Examples