DisentangledAttention_TRT#

DisentangledAttention_TRT - 1 #

Version

name: DisentangledAttention_TRT (GitHub)
domain: main
since_version: 1
function:
support_level:
shape inference:

This version of the operator has been available since version 1.

Summary

Disentangled Attention TensorRT Plugin.

Attributes

factor (required): Scaling factor applied to attention values, 1/sqrt(3d). d is hidden size per head = H/N. H is hidden size, N is number of heads. Default value is ?.
span (required): Maximum relative distance, k. Default value is ?.

Inputs

c2c_attention (heterogeneous) - T: content-to-content attention tensor, QcKc^T.
c2p_attention (heterogeneous) - T: content-to-position attention tensor, QcKr^T.
p2c_attention (heterogeneous) - T: position-to-content attention tensor, KcQr^T.

Outputs

disentangled_attention (heterogeneous) - T: The disentangled attention output tensor.

Examples