DisentangledAttention_TRT#

DisentangledAttention_TRT - 1#

Version

This version of the operator has been available since version 1.

Summary

Disentangled Attention TensorRT Plugin.

Attributes

  • factor (required): Scaling factor applied to attention values, 1/sqrt(3d). d is hidden size per head = H/N. H is hidden size, N is number of heads. Default value is ?.

  • span (required): Maximum relative distance, k. Default value is ?.

Inputs

  • c2c_attention (heterogeneous) - T: content-to-content attention tensor, QcKc^T.

  • c2p_attention (heterogeneous) - T: content-to-position attention tensor, QcKr^T.

  • p2c_attention (heterogeneous) - T: position-to-content attention tensor, KcQr^T.

Outputs

  • disentangled_attention (heterogeneous) - T: The disentangled attention output tensor.

Examples