Conv#

Conv - 11#

Version

  • name: Conv (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

Examples

default

x = np.array(
    [
        [
            [
                [0.0, 1.0, 2.0, 3.0, 4.0],  # (1, 1, 5, 5) input tensor
                [5.0, 6.0, 7.0, 8.0, 9.0],
                [10.0, 11.0, 12.0, 13.0, 14.0],
                [15.0, 16.0, 17.0, 18.0, 19.0],
                [20.0, 21.0, 22.0, 23.0, 24.0],
            ]
        ]
    ]
).astype(np.float32)
W = np.array(
    [
        [
            [
                [1.0, 1.0, 1.0],  # (1, 1, 3, 3) tensor for convolution weights
                [1.0, 1.0, 1.0],
                [1.0, 1.0, 1.0],
            ]
        ]
    ]
).astype(np.float32)

# Convolution with padding
node_with_padding = onnx.helper.make_node(
    "Conv",
    inputs=["x", "W"],
    outputs=["y"],
    kernel_shape=[3, 3],
    # Default values for other attributes: strides=[1, 1], dilations=[1, 1], groups=1
    pads=[1, 1, 1, 1],
)
y_with_padding = np.array(
    [
        [
            [
                [12.0, 21.0, 27.0, 33.0, 24.0],  # (1, 1, 5, 5) output tensor
                [33.0, 54.0, 63.0, 72.0, 51.0],
                [63.0, 99.0, 108.0, 117.0, 81.0],
                [93.0, 144.0, 153.0, 162.0, 111.0],
                [72.0, 111.0, 117.0, 123.0, 84.0],
            ]
        ]
    ]
).astype(np.float32)
expect(
    node_with_padding,
    inputs=[x, W],
    outputs=[y_with_padding],
    name="test_basic_conv_with_padding",
)

# Convolution without padding
node_without_padding = onnx.helper.make_node(
    "Conv",
    inputs=["x", "W"],
    outputs=["y"],
    kernel_shape=[3, 3],
    # Default values for other attributes: strides=[1, 1], dilations=[1, 1], groups=1
    pads=[0, 0, 0, 0],
)
y_without_padding = np.array(
    [
        [
            [
                [54.0, 63.0, 72.0],  # (1, 1, 3, 3) output tensor
                [99.0, 108.0, 117.0],
                [144.0, 153.0, 162.0],
            ]
        ]
    ]
).astype(np.float32)
expect(
    node_without_padding,
    inputs=[x, W],
    outputs=[y_without_padding],
    name="test_basic_conv_without_padding",
)

_conv_with_strides

x = np.array(
    [
        [
            [
                [0.0, 1.0, 2.0, 3.0, 4.0],  # (1, 1, 7, 5) input tensor
                [5.0, 6.0, 7.0, 8.0, 9.0],
                [10.0, 11.0, 12.0, 13.0, 14.0],
                [15.0, 16.0, 17.0, 18.0, 19.0],
                [20.0, 21.0, 22.0, 23.0, 24.0],
                [25.0, 26.0, 27.0, 28.0, 29.0],
                [30.0, 31.0, 32.0, 33.0, 34.0],
            ]
        ]
    ]
).astype(np.float32)
W = np.array(
    [
        [
            [
                [1.0, 1.0, 1.0],  # (1, 1, 3, 3) tensor for convolution weights
                [1.0, 1.0, 1.0],
                [1.0, 1.0, 1.0],
            ]
        ]
    ]
).astype(np.float32)

# Convolution with strides=2 and padding
node_with_padding = onnx.helper.make_node(
    "Conv",
    inputs=["x", "W"],
    outputs=["y"],
    kernel_shape=[3, 3],
    pads=[1, 1, 1, 1],
    strides=[
        2,
        2,
    ],  # Default values for other attributes: dilations=[1, 1], groups=1
)
y_with_padding = np.array(
    [
        [
            [
                [12.0, 27.0, 24.0],  # (1, 1, 4, 3) output tensor
                [63.0, 108.0, 81.0],
                [123.0, 198.0, 141.0],
                [112.0, 177.0, 124.0],
            ]
        ]
    ]
).astype(np.float32)
expect(
    node_with_padding,
    inputs=[x, W],
    outputs=[y_with_padding],
    name="test_conv_with_strides_padding",
)

# Convolution with strides=2 and no padding
node_without_padding = onnx.helper.make_node(
    "Conv",
    inputs=["x", "W"],
    outputs=["y"],
    kernel_shape=[3, 3],
    pads=[0, 0, 0, 0],
    strides=[
        2,
        2,
    ],  # Default values for other attributes: dilations=[1, 1], groups=1
)
y_without_padding = np.array(
    [
        [
            [
                [54.0, 72.0],  # (1, 1, 3, 2) output tensor
                [144.0, 162.0],
                [234.0, 252.0],
            ]
        ]
    ]
).astype(np.float32)
expect(
    node_without_padding,
    inputs=[x, W],
    outputs=[y_without_padding],
    name="test_conv_with_strides_no_padding",
)

# Convolution with strides=2 and padding only along one dimension (the H dimension in NxCxHxW tensor)
node_with_asymmetric_padding = onnx.helper.make_node(
    "Conv",
    inputs=["x", "W"],
    outputs=["y"],
    kernel_shape=[3, 3],
    pads=[1, 0, 1, 0],
    strides=[
        2,
        2,
    ],  # Default values for other attributes: dilations=[1, 1], groups=1
)
y_with_asymmetric_padding = np.array(
    [
        [
            [
                [21.0, 33.0],  # (1, 1, 4, 2) output tensor
                [99.0, 117.0],
                [189.0, 207.0],
                [171.0, 183.0],
            ]
        ]
    ]
).astype(np.float32)
expect(
    node_with_asymmetric_padding,
    inputs=[x, W],
    outputs=[y_with_asymmetric_padding],
    name="test_conv_with_strides_and_asymmetric_padding",
)

_conv_with_autopad_same

x = np.array(
    [
        [
            [
                [0.0, 1.0, 2.0, 3.0, 4.0],  # (1, 1, 5, 5) input tensor
                [5.0, 6.0, 7.0, 8.0, 9.0],
                [10.0, 11.0, 12.0, 13.0, 14.0],
                [15.0, 16.0, 17.0, 18.0, 19.0],
                [20.0, 21.0, 22.0, 23.0, 24.0],
            ]
        ]
    ]
).astype(np.float32)
W = np.array(
    [
        [
            [
                [1.0, 1.0, 1.0],  # (1, 1, 3, 3) tensor for convolution weights
                [1.0, 1.0, 1.0],
                [1.0, 1.0, 1.0],
            ]
        ]
    ]
).astype(np.float32)

# Convolution with auto_pad='SAME_LOWER' and strides=2
node = onnx.helper.make_node(
    "Conv",
    inputs=["x", "W"],
    outputs=["y"],
    auto_pad="SAME_LOWER",
    kernel_shape=[3, 3],
    strides=[2, 2],
)
y = np.array(
    [[[[12.0, 27.0, 24.0], [63.0, 108.0, 81.0], [72.0, 117.0, 84.0]]]]
).astype(np.float32)
expect(node, inputs=[x, W], outputs=[y], name="test_conv_with_autopad_same")

Differences

00The convolution operator consumes an input tensor and a filter, andThe convolution operator consumes an input tensor and a filter, and
11computes the output.computes the output.
22
33**Attributes****Attributes**
44
55* **auto_pad**:* **auto_pad**:
66 auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID.
77 Where default value is NOTSET, which means explicit padding is used. Where default value is NOTSET, which means explicit padding is used.
88 SAME_UPPER or SAME_LOWER mean pad the input so that the output SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i]
9 spatial size match the input.In case of odd number add the extra
9 = ceil(input_shape[i] / strides[i]) for each axis i. The padding
10 is split between the two sides equally or almost equally (depending
11 on whether it is even or odd). In case the padding is an odd number,
1012 padding at the end for SAME_UPPER and at the beginning for the extra padding is added at the end for SAME_UPPER and at the
1113 SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'. beginning for SAME_LOWER. Default value is 'NOTSET'.
1214* **dilations**:* **dilations**:
1315 dilation value along each spatial axis of the filter. dilation value along each spatial axis of the filter. If not
16 present, the dilation defaults is 1 along each spatial axis.
1417* **group**:* **group**:
1518 number of groups input channels and output channels are divided number of groups input channels and output channels are divided
1619 into. Default value is 1. into. Default value is 1.
1720* **kernel_shape**:* **kernel_shape**:
1821 The shape of the convolution kernel. If not present, should be The shape of the convolution kernel. If not present, should be
1922 inferred from input W. inferred from input W.
2023* **pads**:* **pads**:
2124 Padding for the beginning and ending along each spatial axis, it can Padding for the beginning and ending along each spatial axis, it can
2225 take any value greater than or equal to 0. The value represent the take any value greater than or equal to 0. The value represent the
2326 number of pixels added to the beginning and end part of the number of pixels added to the beginning and end part of the
2427 corresponding axis. pads format should be as follow [x1_begin, corresponding axis. pads format should be as follow [x1_begin,
2528 x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels
2629 added at the beginning of axis i and xi_end, the number of pixels added at the beginning of axis i and xi_end, the number of pixels
2730 added at the end of axis i. This attribute cannot be used added at the end of axis i. This attribute cannot be used
2831 simultaneously with auto_pad attribute. If not present, the padding simultaneously with auto_pad attribute. If not present, the padding
2932 defaults to 0 along start and end of each spatial axis. defaults to 0 along start and end of each spatial axis.
3033* **strides**:* **strides**:
3134 Stride along each spatial axis. Stride along each spatial axis. If not present, the stride defaults
35 is 1 along each spatial axis.
3236
3337**Inputs****Inputs**
3438
3539Between 2 and 3 inputs.Between 2 and 3 inputs.
3640
3741* **X** (heterogeneous) - **T**:* **X** (heterogeneous) - **T**:
3842 Input data tensor from previous layer; has size (N x C x H x W), Input data tensor from previous layer; has size (N x C x H x W),
3943 where N is the batch size, C is the number of channels, and H and W where N is the batch size, C is the number of channels, and H and W
4044 are the height and width. Note that this is for the 2D image. are the height and width. Note that this is for the 2D image.
4145 Otherwise the size is (N x C x D1 x D2 ... x Dn). Optionally, if Otherwise the size is (N x C x D1 x D2 ... x Dn). Optionally, if
4246 dimension denotation is in effect, the operation expects input data dimension denotation is in effect, the operation expects input data
4347 tensor to arrive with the dimension denotation of [DATA_BATCH, tensor to arrive with the dimension denotation of [DATA_BATCH,
4448 DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].
4549* **W** (heterogeneous) - **T**:* **W** (heterogeneous) - **T**:
4650 The weight tensor that will be used in the convolutions; has size (M The weight tensor that will be used in the convolutions; has size (M
4751 x C/group x kH x kW), where C is the number of channels, and kH and x C/group x kH x kW), where C is the number of channels, and kH and
4852 kW are the height and width of the kernel, and M is the number of kW are the height and width of the kernel, and M is the number of
4953 feature maps. For more than 2 dimensions, the kernel shape will be feature maps. For more than 2 dimensions, the kernel shape will be
5054 (M x C/group x k1 x k2 x ... x kn), where (k1 x k2 x ... kn) is the (M x C/group x k1 x k2 x ... x kn), where (k1 x k2 x ... kn) is the
5155 dimension of the kernel. Optionally, if dimension denotation is in dimension of the kernel. Optionally, if dimension denotation is in
5256 effect, the operation expects the weight tensor to arrive with the effect, the operation expects the weight tensor to arrive with the
5357 dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL,
5458 FILTER_SPATIAL, FILTER_SPATIAL ...]. X.shape[1] == (W.shape[1] * FILTER_SPATIAL, FILTER_SPATIAL ...]. Assuming zero based indices for
55 group) == C (assuming zero based indices for the shape array). Or in
59 the shape array, X.shape[1] == (W.shape[1] * group) == C and
60 W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL
5661 other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL. multiplied by the number of groups should be equal to DATA_CHANNEL
62 and the number of feature maps M should be a multiple of the number
63 of groups G.
5764* **B** (optional, heterogeneous) - **T**:* **B** (optional, heterogeneous) - **T**:
5865 Optional 1D bias to be added to the convolution, has size of M. Optional 1D bias to be added to the convolution, has size of M.
5966
6067**Outputs****Outputs**
6168
6269* **Y** (heterogeneous) - **T**:* **Y** (heterogeneous) - **T**:
6370 Output data tensor that contains the result of the convolution. The Output data tensor that contains the result of the convolution. The
6471 output dimensions are functions of the kernel size, stride size, and output dimensions are functions of the kernel size, stride size, and
6572 pad lengths. pad lengths.
6673
6774**Type Constraints****Type Constraints**
6875
6976* **T** in (* **T** in (
7077 tensor(double), tensor(double),
7178 tensor(float), tensor(float),
7279 tensor(float16) tensor(float16)
7380 ): ):
7481 Constrain input and output types to float tensors. Constrain input and output types to float tensors.

Conv - 1#

Version

  • name: Conv (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.