com.microsoft - QGemm#

QGemm - 1 (com.microsoft)#

Version

  • name: QGemm (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Quantized Gemm

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is ?.

  • transA: Whether A should be transposed Default value is ?.

  • transB: Whether B should be transposed Default value is ?.

Inputs

Between 6 and 9 inputs.

  • A (heterogeneous) - TA: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • a_scale (heterogeneous) - T: Scale of quantized input ‘A’. It is a scalar,which means a per- tensor quantization.

  • a_zero_point (heterogeneous) - TA: Zero point tensor for input ‘A’. It is a scalar.

  • B (heterogeneous) - TB: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • b_scale (heterogeneous) - T: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • b_zero_point (heterogeneous) - TB: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • C (optional, heterogeneous) - TC: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N). Its type is int32_t and must be quantized with zero_point = 0 and scale = alpha / beta * a_scale * b_scale.

  • y_scale (optional, heterogeneous) - T: Scale of output ‘Y’. It is a scalar, which means a per-tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.

  • y_zero_point (optional, heterogeneous) - TYZ: Zero point tensor for output ‘Y’. It is a scalar, which means a per- tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.

Outputs

  • Y (heterogeneous) - TY: Output tensor of shape (M, N).

Examples