com.microsoft - QGemm#
QGemm - 1 (com.microsoft)#
Version
name: QGemm (GitHub)
domain: com.microsoft
since_version: 1
function:
support_level:
shape inference:
This version of the operator has been available since version 1 of domain com.microsoft.
Summary
Quantized Gemm
Attributes
alpha: Scalar multiplier for the product of input tensors A * B. Default value is
?
.transA: Whether A should be transposed Default value is
?
.transB: Whether B should be transposed Default value is
?
.
Inputs
Between 6 and 9 inputs.
A (heterogeneous) - TA: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.
a_scale (heterogeneous) - T: Scale of quantized input ‘A’. It is a scalar,which means a per- tensor quantization.
a_zero_point (heterogeneous) - TA: Zero point tensor for input ‘A’. It is a scalar.
B (heterogeneous) - TB: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.
b_scale (heterogeneous) - T: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.
b_zero_point (heterogeneous) - TB: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.
C (optional, heterogeneous) - TC: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N). Its type is int32_t and must be quantized with zero_point = 0 and scale = alpha / beta * a_scale * b_scale.
y_scale (optional, heterogeneous) - T: Scale of output ‘Y’. It is a scalar, which means a per-tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.
y_zero_point (optional, heterogeneous) - TYZ: Zero point tensor for output ‘Y’. It is a scalar, which means a per- tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.
Outputs
Y (heterogeneous) - TY: Output tensor of shape (M, N).
Examples