.. _l-onnx-doccom.microsoft-DecoderAttention: ================================ com.microsoft - DecoderAttention ================================ .. contents:: :local: .. _l-onnx-opcom-microsoft-decoderattention-1: DecoderAttention - 1 (com.microsoft) ==================================== **Version** * **name**: `DecoderAttention (GitHub) `_ * **domain**: **com.microsoft** * **since_version**: **1** * **function**: * **support_level**: * **shape inference**: This version of the operator has been available **since version 1 of domain com.microsoft**. **Summary** This DecoderAttention supports self attention and cross attention, key and value cache, and key_padding_mask. The attention mask is not support at the moment. Some boolean parameters are passed by runtime input for generic purpose **Attributes** * **num_heads** (required): Number of attention heads Default value is ``?``. **Inputs** * **query** (heterogeneous) - **T**: 3D input tensor with shape (sequence_length, batch_size, hidden_size), hidden_size = num_heads * head_size * **key** (heterogeneous) - **T**: 3D input tensor with shape (total_sequence_length, batch_size, hidden_size) * **q_weight** (heterogeneous) - **T**: 2D input tensor with shape (hidden_size, hidden_size) * **kv_weight** (heterogeneous) - **T**: 2D input tensor with shape (hidden_size, 2 * hidden_size) * **bias** (heterogeneous) - **T**: 1D input tensor with shape (3 * hidden_size) * **key_padding_mask** (optional, heterogeneous) - **B**: 2D input tensor with shape (batch_size, total_sequence_length) * **key_cache** (optional, heterogeneous) - **T**: input tensor with shape (batch_size, num_heads, sequence_length or total_sequence_length, head_size) * **value_cache** (optional, heterogeneous) - **T**: input tensor with shape (batch_size, num_heads, sequence_length or total_sequence_length, head_size) * **static_kv** (heterogeneous) - **B**: If static_kv = true, cross-attention; else self-attention * **use_past** (heterogeneous) - **B**: If use_past = true, use cache; else no cache * **has_layer_state** (heterogeneous) - **B**: If has_layer_state = true, layer_state = {} or [a,b]; else layer_state = None * **has_key_padding_mask** (heterogeneous) - **B**: has_key_padding_mask or not **Outputs** Between 1 and 3 outputs. * **output** (heterogeneous) - **T**: 3D output tensor with shape (sequence_length, batch_size, hidden_size) * **new_key_cache** (optional, heterogeneous) - **T**: output tensor with shape (batch_size, num_heads, new sequence_length, head_size) * **new_value_cache** (optional, heterogeneous) - **T**: output tensor with shape (batch_size, num_heads, new sequence_length, head_size) **Examples**