.. _l-onnx-doccom.microsoft-LongformerAttention: =================================== com.microsoft - LongformerAttention =================================== .. contents:: :local: .. _l-onnx-opcom-microsoft-longformerattention-1: LongformerAttention - 1 (com.microsoft) ======================================= **Version** * **name**: `LongformerAttention (GitHub) `_ * **domain**: **com.microsoft** * **since_version**: **1** * **function**: * **support_level**: * **shape inference**: This version of the operator has been available **since version 1 of domain com.microsoft**. **Summary** Longformer Self Attention with a local context and a global context. Tokens attend locally: Each token attends to its W previous tokens and W succeding tokens with W being the window length. A selected few tokens attend globally to all other tokens. The attention mask is of shape (batch_size, sequence_length), where sequence_length is a multiple of 2W after padding. Mask value < 0 (like -10000.0) means the token is masked, 0 otherwise. Global attention flags have value 1 for the tokens attend globally and 0 otherwise. **Attributes** * **num_heads** (required): Number of attention heads Default value is ``?``. * **window** (required): One sided attention windows length W, or half of total window length Default value is ``?``. **Inputs** * **input** (heterogeneous) - **T**: 3D input tensor with shape (batch_size, sequence_length, hidden_size), hidden_size = num_heads * head_size * **weight** (heterogeneous) - **T**: 2D input tensor with shape (hidden_size, 3 * hidden_size) * **bias** (heterogeneous) - **T**: 1D input tensor with shape (3 * hidden_size) * **mask** (heterogeneous) - **T**: Attention mask with shape (batch_size, sequence_length) * **global_weight** (heterogeneous) - **T**: 2D input tensor with shape (hidden_size, 3 * hidden_size) * **global_bias** (heterogeneous) - **T**: 1D input tensor with shape (3 * hidden_size) * **global** (heterogeneous) - **G**: Global attention flags with shape (batch_size, sequence_length) **Outputs** * **output** (heterogeneous) - **T**: 3D output tensor with shape (batch_size, sequence_length, hidden_size) **Examples**