Softmax#

Softmax - 13#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the normalized exponential values for the given input:

Softmax(input, axis) = Exp(input) / ReduceSum(Exp(input), axis=axis, keepdims=1)

The “axis” attribute indicates the dimension along which Softmax will be performed. The output tensor has the same shape and contains the Softmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension Softmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Examples

default

node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
)
x = np.array([[-1, 0, 1]]).astype(np.float32)
# expected output [[0.09003058, 0.24472848, 0.66524094]]
y = softmax(x, axis=1)
expect(node, inputs=[x], outputs=[y], name="test_softmax_example")

_softmax_axis

x = np.array([[0, 1, 2, 3], [10000, 10001, 10002, 10003]]).astype(np.float32)
# expected output
# [[0.032058604 0.08714432  0.23688284  0.6439143  ]
# [0.032058604 0.08714432  0.23688284  0.6439143  ]]
y = softmax(x)

node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
)
expect(node, inputs=[x], outputs=[y], name="test_softmax_large_number")

x = np.abs(np.random.randn(3, 4, 5).astype(np.float32))
node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
    axis=0,
)
y = softmax(x, axis=0)
expect(node, inputs=[x], outputs=[y], name="test_softmax_axis_0")

node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
    axis=1,
)
y = softmax(x, axis=1)
expect(node, inputs=[x], outputs=[y], name="test_softmax_axis_1")

node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
    axis=2,
)
y = softmax(x, axis=2)
expect(node, inputs=[x], outputs=[y], name="test_softmax_axis_2")

node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
    axis=-1,
)
y = softmax(x, axis=-1)
expect(node, inputs=[x], outputs=[y], name="test_softmax_negative_axis")

# default axis is -1
node = onnx.helper.make_node(
    "Softmax",
    inputs=["x"],
    outputs=["y"],
)
expect(node, inputs=[x], outputs=[y], name="test_softmax_default_axis")

Differences

00The operator computes the softmax (normalized exponential) values for each layer in the batchThe operator computes the normalized exponential values for the given input:
1
12 of the given input. Softmax(input, axis) = Exp(input) / ReduceSum(Exp(input), axis=axis, keepdims=1)
23
3The input does not need to explicitly be a 2D vector; rather, it will be
4coerced into one. For an arbitrary n-dimensional tensor
5input \in [a_0, a_1, ..., a_{k-1}, a_k, ..., a_{n-1}] and k is
64the axis provided, then input will be coerced into a 2-dimensional tensor withThe "axis" attribute indicates the dimension along which Softmax
7dimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the default
8case where axis=1, this means the input tensor will be coerced into a 2D tensor
9of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size.
10In this situation, we must have a_0 = N and a_1 * ... * a_{n-1} = D.
11Each of these dimensions must be matched correctly, or else the operator
125will throw errors. The output tensor has the same shapewill be performed. The output tensor has the same shape
136and contains the softmax values of the corresponding input.and contains the Softmax values of the corresponding input.
147
158**Attributes****Attributes**
169
1710* **axis**:* **axis**:
1811 Describes the axis of the inputs when coerced to 2D; defaults to one Describes the dimension Softmax will be performed on. Negative
19 because the 0th axis most likely describes the batch_size. Negative
2012 value means counting dimensions from the back. Accepted range is value means counting dimensions from the back. Accepted range is
2113 [-r, r-1] where r = rank(input). Default value is 1. [-r, r-1] where r = rank(input). Default value is -1.
2214
2315**Inputs****Inputs**
2416
2517* **input** (heterogeneous) - **T**:* **input** (heterogeneous) - **T**:
2618 The input tensor that's coerced into a 2D matrix of size (NxD) as The input tensor of rank >= axis.
27 described above.
2819
2920**Outputs****Outputs**
3021
3122* **output** (heterogeneous) - **T**:* **output** (heterogeneous) - **T**:
3223 The output values with the same shape as input tensor (the original The output values with the same shape as the input tensor.
33 size without coercion).
3424
3525**Type Constraints****Type Constraints**
3626
3727* **T** in (* **T** in (
28 tensor(bfloat16),
3829 tensor(double), tensor(double),
3930 tensor(float), tensor(float),
4031 tensor(float16) tensor(float16)
4132 ): ):
4233 Constrain input and output types to float tensors. Constrain input and output types to float tensors.

Softmax - 11#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The operator computes the softmax (normalized exponential) values for each layer in the batch

of the given input.

The input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors. The output tensor has the same shape and contains the softmax values of the corresponding input.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Differences

00The operator computes the softmax (normalized exponential) values for each layer in the batchThe operator computes the softmax (normalized exponential) values for each layer in the batch
11 of the given input. The input is a 2-D tensor (Tensor<float>) of size of the given input.
2(batch_size x input_feature_dimensions). The output tensor has the same shape
3and contains the softmax values of the corresponding input.
42
53Input does not need to explicitly be a 2D vector; rather, it will beThe input does not need to explicitly be a 2D vector; rather, it will be
64coerced into one. For an arbitrary n-dimensional tensorcoerced into one. For an arbitrary n-dimensional tensor
75input \in [a_0, a_1, ..., a_{k-1}, a_k, ..., a_{n-1}] and k isinput \in [a_0, a_1, ..., a_{k-1}, a_k, ..., a_{n-1}] and k is
86the axis provided, then input will be coerced into a 2-dimensional tensor withthe axis provided, then input will be coerced into a 2-dimensional tensor with
97dimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the defaultdimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the default
108case where axis=1, this means the input tensor will be coerced into a 2D tensorcase where axis=1, this means the input tensor will be coerced into a 2D tensor
119of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size.of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size.
1210In this situation, we must have a_0 = N and a_1 * ... * a_{n-1} = D.In this situation, we must have a_0 = N and a_1 * ... * a_{n-1} = D.
1311Each of these dimensions must be matched correctly, or else the operatorEach of these dimensions must be matched correctly, or else the operator
1412will throw errors.will throw errors. The output tensor has the same shape
13and contains the softmax values of the corresponding input.
1514
1615**Attributes****Attributes**
1716
1817* **axis**:* **axis**:
1918 Describes the axis of the inputs when coerced to 2D; defaults to one Describes the axis of the inputs when coerced to 2D; defaults to one
2019 because the 0th axis most likely describes the batch_size Default value is 1. because the 0th axis most likely describes the batch_size. Negative
20 value means counting dimensions from the back. Accepted range is
21 [-r, r-1] where r = rank(input). Default value is 1.
2122
2223**Inputs****Inputs**
2324
2425* **input** (heterogeneous) - **T**:* **input** (heterogeneous) - **T**:
2526 The input tensor that's coerced into a 2D matrix of size (NxD) as The input tensor that's coerced into a 2D matrix of size (NxD) as
2627 described above. described above.
2728
2829**Outputs****Outputs**
2930
3031* **output** (heterogeneous) - **T**:* **output** (heterogeneous) - **T**:
3132 The output values with the same shape as input tensor (the original The output values with the same shape as input tensor (the original
3233 size without coercion). size without coercion).
3334
3435**Type Constraints****Type Constraints**
3536
3637* **T** in (* **T** in (
3738 tensor(double), tensor(double),
3839 tensor(float), tensor(float),
3940 tensor(float16) tensor(float16)
4041 ): ):
4142 Constrain input and output types to float tensors. Constrain input and output types to float tensors.

Softmax - 1#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The operator computes the softmax (normalized exponential) values for each layer in the batch

of the given input. The input is a 2-D tensor (Tensor<float>) of size

(batch_size x input_feature_dimensions). The output tensor has the same shape and contains the softmax values of the corresponding input.

Input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.