com.microsoft - BiasSoftmax#

BiasSoftmax - 1 (com.microsoft)#

Version

  • name: BiasSoftmax (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Y = softmax(scores + bias)) with simple broadcast on bias. Intended to specialize softmax(scores + additive_mask) commonly found in transformer models.

Attributes

  • axis: apply softmax to elements for dimensions axis or higher Default value is ?.

  • is_inner_broadcast (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis - 1 Default value is ?.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias (or mask) as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

Examples