FlashAttention3 Forward Compatibility¶
This document is a quick reference for the current FlashAttention-3-compatible forward coverage provided by MATE on MUSA.
At a Glance¶
Area |
Status |
Notes |
|---|---|---|
Q Mode |
✅ Supported |
|
KV Mode |
✅ Supported |
|
Append New KV |
✅ Supported |
|
RoPE Input |
✅ Supported |
|
Cache Index Options |
✅ Supported |
|
Mask Mode |
✅ Supported |
|
Score Mode |
✅ Supported |
Standard softmax and |
Page Size |
✅ Supported |
|
Dtype |
✅ Supported |
|
QV Input |
✅ Supported |
The forward path supports an optional |
HeadDim |
✅ Supported |
Any |
Optimization |
✅ Supported |
|
Output |
✅ Supported |
|
MATE Extensions¶
Extension |
Status |
Notes |
|---|---|---|
Context Parallel |
✅ Supported |
|
Learnable Sink |
✅ Supported |
Supported on the local-attention path |
Notes¶
This page summarizes the compatibility surface, not every internal kernel detail.
The statement
Any headdim <= 512refers to the supported forward-path head-dimension range.FP8 forward support currently refers to
torch.float8_e4m3fn; pass optionalq_descale,k_descale, andv_descaletensors with shape(batch_size, num_heads_kv)when scale factors are required.When both
qand the optionalqvinput are FP8,q_descaleapplies to both query tensors;k_descaleandv_descalestill apply to the KV inputs.RoPE is supported only when appending new KV through
k/v;rotary_dimmust be<= headdimand divisible by 16.Local + attention_chunkrequires MUSA SDK >= 5.1.0.FP8 attention works on the forward path today. For best performance, use MUSA SDK 5.2.0 or newer when available.
For wrapper-level usage, see the FlashAttention wrapper page.