module `onnxrt.ops_cpu.op_quantize_linear`#

Short summary#

module mlprodict.onnxrt.ops_cpu.op_quantize_linear

Runtime operator.

Classes#

class	truncated documentation
`_CommonQuantizeLinear`
`DynamicQuantizeLinear`	DynamicQuantizeLinear ===================== A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion …
`QuantizeLinear`	QuantizeLinear ============== The linear quantization operator. It consumes a high precision tensor, a scale, and a zero …

Properties#

property	truncated documentation
`args_default`	Returns the list of arguments as well as the list of parameters with the default values (close to the signature). …
`args_default`	Returns the list of arguments as well as the list of parameters with the default values (close to the signature). …
`args_default`	Returns the list of arguments as well as the list of parameters with the default values (close to the signature). …
`args_default_modified`	Returns the list of modified parameters.
`args_default_modified`	Returns the list of modified parameters.
`args_default_modified`	Returns the list of modified parameters.
`args_mandatory`	Returns the list of optional arguments.
`args_mandatory`	Returns the list of optional arguments.
`args_mandatory`	Returns the list of optional arguments.
`args_optional`	Returns the list of optional arguments.
`args_optional`	Returns the list of optional arguments.
`args_optional`	Returns the list of optional arguments.
`atts_value`	Returns all parameters in a dictionary.
`atts_value`	Returns all parameters in a dictionary.
`atts_value`	Returns all parameters in a dictionary.

Methods#

method	truncated documentation
`__init__`
`__init__`
`__init__`
`_run`
`_run`
`common_run`
`common_run`

Documentation#

Runtime operator.

source on GitHub

class mlprodict.onnxrt.ops_cpu.op_quantize_linear.DynamicQuantizeLinear(onnx_node, desc=None, **options)#

Bases: OpRun

A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data. Outputs Scale, ZeroPoint and Quantized Input for a given FP32 Input. Scale is calculated as: ``

y_scale = (max(x) - min(x))/(qmax - qmin) * where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8 * data range is adjusted to include 0.

`` Zero point is calculated as: `` intermediate_zero_point = qmin - min(x)/y_scale y_zero_point = cast(round(saturate(itermediate_zero_point))) * where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8 * for saturation, it saturates to [0, 255] if it’s uint8, or [-127, 127] if it’s int8. Right now only uint8 is supported. * rounding to nearest ties to even. `` Data quantization formula is: `` y = saturate (round (x / y_scale) + y_zero_point) * for saturation, it saturates to [0, 255] if it’s uint8, or [-127, 127] if it’s int8. Right now only uint8 is supported. * rounding to nearest ties to even. ``

Inputs

x (heterogeneous)T1: Input tensor

Outputs

y (heterogeneous)T2: Quantized output tensor
y_scale (heterogeneous)tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.
y_zero_point (heterogeneous)T2: Output zero point. It’s a scalar, which means a per-tensor/layer quantization.

Type Constraints

T1 tensor(float): Constrain ‘x’ to float tensor.
T2 tensor(uint8): Constrain ‘y_zero_point’ and ‘y’ to 8-bit unsigned integer tensor.

Version

Onnx name: DynamicQuantizeLinear

This version of the operator has been available since version 11.

Runtime implementation: DynamicQuantizeLinear

__init__(onnx_node, desc=None, **options)#

_run(x, attributes=None, verbose=0, fLOG=None)#

Should be overwritten.

source on GitHub

class mlprodict.onnxrt.ops_cpu.op_quantize_linear.QuantizeLinear(onnx_node, desc=None, **options)#

Bases: _CommonQuantizeLinear

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.

Attributes

axis: (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is nameaxisi1typeINT (INT)

Inputs

Between 2 and 3 inputs.

x (heterogeneous)T1: N-D full precision Input tensor to be quantized.
y_scale (heterogeneous)tensor(float): Scale for doing quantization to get ‘y’. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.
y_zero_point (optional, heterogeneous)T2: Zero point for doing quantization to get ‘y’. Shape must match y_scale. Default is uint8 with zero point of 0 if it’s not specified.

Outputs

y (heterogeneous)T2: N-D quantized output tensor. It has same shape as input ‘x’.

Type Constraints

T1 tensor(float), tensor(int32): Constrain ‘x’ to float or int32 tensor.
T2 tensor(int8), tensor(uint8): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.

Version

Onnx name: QuantizeLinear

This version of the operator has been available since version 13.

Runtime implementation: QuantizeLinear

__init__(onnx_node, desc=None, **options)#

_run(*args, attributes=None, verbose=0, fLOG=None)#

Should be overwritten.

source on GitHub

class mlprodict.onnxrt.ops_cpu.op_quantize_linear._CommonQuantizeLinear(onnx_node, desc=None, expected_attributes=None, **options)#

Bases: OpRun

__init__(onnx_node, desc=None, expected_attributes=None, **options)#

module onnxrt.ops_cpu.op_quantize_linear#

Short summary#

Classes#

Properties#

Methods#

Documentation#

module `onnxrt.ops_cpu.op_quantize_linear`#