DisentangledAttention_TRT#
DisentangledAttention_TRT - 1#
Version
domain: main
since_version: 1
function:
support_level:
shape inference:
This version of the operator has been available since version 1.
Summary
Disentangled Attention TensorRT Plugin.
Attributes
factor (required): Scaling factor applied to attention values, 1/sqrt(3d). d is hidden size per head = H/N. H is hidden size, N is number of heads. Default value is
?
.span (required): Maximum relative distance, k. Default value is
?
.
Inputs
c2c_attention (heterogeneous) - T: content-to-content attention tensor, QcKc^T.
c2p_attention (heterogeneous) - T: content-to-position attention tensor, QcKr^T.
p2c_attention (heterogeneous) - T: position-to-content attention tensor, KcQr^T.
Outputs
disentangled_attention (heterogeneous) - T: The disentangled attention output tensor.
Examples