.. _l-onnx-doc-QuantizeLinear: ============== QuantizeLinear ============== .. contents:: :local: .. _l-onnx-op-quantizelinear-13: QuantizeLinear - 13 =================== **Version** * **name**: `QuantizeLinear (GitHub) `_ * **domain**: **main** * **since_version**: **13** * **function**: False * **support_level**: SupportType.COMMON * **shape inference**: True This version of the operator has been available **since version 13**. **Summary** The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8. For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type. **Attributes** * **axis**: (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is ``1``. **Inputs** Between 2 and 3 inputs. * **x** (heterogeneous) - **T1**: N-D full precision Input tensor to be quantized. * **y_scale** (heterogeneous) - **tensor(float)**: Scale for doing quantization to get 'y'. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization. * **y_zero_point** (optional, heterogeneous) - **T2**: Zero point for doing quantization to get 'y'. Shape must match y_scale. Default is uint8 with zero point of 0 if it's not specified. **Outputs** * **y** (heterogeneous) - **T2**: N-D quantized output tensor. It has same shape as input 'x'. **Type Constraints** * **T1** in ( tensor(float), tensor(int32) ): Constrain 'x' to float or int32 tensor. * **T2** in ( tensor(int8), tensor(uint8) ): Constrain 'y_zero_point' and 'y' to 8-bit integer tensor. **Examples** **default** :: node = onnx.helper.make_node( "QuantizeLinear", inputs=["x", "y_scale", "y_zero_point"], outputs=["y"], ) x = np.array([0, 2, 3, 1000, -254, -1000]).astype(np.float32) y_scale = np.float32(2) y_zero_point = np.uint8(128) y = np.array([128, 129, 130, 255, 1, 0]).astype(np.uint8) expect( node, inputs=[x, y_scale, y_zero_point], outputs=[y], name="test_quantizelinear", ) **_axis** :: node = onnx.helper.make_node( "QuantizeLinear", inputs=["x", "y_scale", "y_zero_point"], outputs=["y"], ) x = np.array( [ [ [[-162, 10], [-100, 232], [-20, -50]], [[-76, 0], [0, 252], [32, -44]], [[245, -485], [-960, -270], [-375, -470]], ], ], dtype=np.float32, ) y_scale = np.array([2, 4, 5], dtype=np.float32) y_zero_point = np.array([84, 24, 196], dtype=np.uint8) y = (x / y_scale.reshape(1, 3, 1, 1) + y_zero_point.reshape(1, 3, 1, 1)).astype( np.uint8 ) expect( node, inputs=[x, y_scale, y_zero_point], outputs=[y], name="test_quantizelinear_axis", ) **Differences** .. raw:: html
00The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor.The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
1The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
12The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.The quantization formula is y = saturate ((x / y_scale) + y_zero_point).
3For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
24For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type.
35
6**Attributes**
7
8* **axis**:
9 (Optional) The axis of the quantization dimension of the input
10 tensor. Ignored for per-tensor quantization. Negative value means
11 counting dimensions from the back. Accepted range is [-r, r-1] where
12 r = rank(input). Default value is 1.
13
414**Inputs****Inputs**
515
616Between 2 and 3 inputs.Between 2 and 3 inputs.
717
818* **x** (heterogeneous) - **T1**:* **x** (heterogeneous) - **T1**:
919 N-D full precision Input tensor to be quantized. N-D full precision Input tensor to be quantized.
1020* **y_scale** (heterogeneous) - **tensor(float)**:* **y_scale** (heterogeneous) - **tensor(float)**:
1121 Scale for doing quantization to get 'y'. It's a scalar, which means Scale for doing quantization to get 'y'. It can be a scalar, which
1222 a per-tensor/layer quantization. means per-tensor/layer quantization, or a 1-D Tensor for per-axis
23 quantization.
1324* **y_zero_point** (optional, heterogeneous) - **T2**:* **y_zero_point** (optional, heterogeneous) - **T2**:
1425 Zero point for doing quantization to get 'y'. It's a scalar, which Zero point for doing quantization to get 'y'. Shape must match
15 means a per-tensor/layer quantization. Default value is uint8 typed
1626 0 if it's not specified. y_scale. Default is uint8 with zero point of 0 if it's not
27 specified.
1728
1829**Outputs****Outputs**
1930
2031* **y** (heterogeneous) - **T2**:* **y** (heterogeneous) - **T2**:
2132 N-D quantized output tensor. It has same shape as input 'x'. N-D quantized output tensor. It has same shape as input 'x'.
2233
2334**Type Constraints****Type Constraints**
2435
2536* **T1** in (* **T1** in (
2637 tensor(float), tensor(float),
2738 tensor(int32) tensor(int32)
2839 ): ):
2940 Constrain 'x' to float or int32 tensor. Constrain 'x' to float or int32 tensor.
3041* **T2** in (* **T2** in (
3142 tensor(int8), tensor(int8),
3243 tensor(uint8) tensor(uint8)
3344 ): ):
3445 Constrain 'y_zero_point' and 'y' to 8-bit integer tensor. Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.
.. _l-onnx-op-quantizelinear-10: QuantizeLinear - 10 =================== **Version** * **name**: `QuantizeLinear (GitHub) `_ * **domain**: **main** * **since_version**: **10** * **function**: False * **support_level**: SupportType.COMMON * **shape inference**: True This version of the operator has been available **since version 10**. **Summary** The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8. For (x / y_scale), it's rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. 'y_zero_point' and 'y' must have same type. **Inputs** Between 2 and 3 inputs. * **x** (heterogeneous) - **T1**: N-D full precision Input tensor to be quantized. * **y_scale** (heterogeneous) - **tensor(float)**: Scale for doing quantization to get 'y'. It's a scalar, which means a per-tensor/layer quantization. * **y_zero_point** (optional, heterogeneous) - **T2**: Zero point for doing quantization to get 'y'. It's a scalar, which means a per-tensor/layer quantization. Default value is uint8 typed 0 if it's not specified. **Outputs** * **y** (heterogeneous) - **T2**: N-D quantized output tensor. It has same shape as input 'x'. **Type Constraints** * **T1** in ( tensor(float), tensor(int32) ): Constrain 'x' to float or int32 tensor. * **T2** in ( tensor(int8), tensor(uint8) ): Constrain 'y_zero_point' and 'y' to 8-bit integer tensor.