MATE Gated Delta Network (GDN) Support Matrix¶
This document provides a comprehensive overview of the Gated Delta Network (GDN) features, data types, and execution backends currently supported in MATE. Use this matrix to verify configuration compatibility for production deployment and optimization.
Overview¶
MATE provides optimized kernels for Gated Delta Networks (GDN). MATE splits GDN execution into two primary phases:
Prefill: high-throughput processing of prompt contexts.
Decode: low-latency generation, including standard single-token streaming and multi-token prediction (MTP).
Decode¶
Dispatch Matrix¶
|
|
|
Status |
Notes |
|---|---|---|---|---|
|
|
|
❌ Not supported |
Single-token BF16-state decode is not implemented yet |
|
|
|
✅ Supported |
TileLang MTP path, currently K=V=128 |
|
|
|
✅ Supported |
Current active decode path |
|
|
|
✅ Supported |
TileLang MTP path, currently K=V=128 |
|
|
|
❌ Not supported |
KV backend is not implemented yet |
|
anything else |
any |
❌ Not supported |
Unsupported combination |
Current MATE Active Path¶
Current MATE decode support is intentionally narrow:
API:
mate.gated_delta_rule_decode(...)Supported combinations:
state_layout="VK",state.dtype=float32,T=1state_layout="VK",state.dtype=float32,T>1,K=V=128state_layout="VK",state.dtype=bfloat16,T>1,K=V=128
Backend: TileLang
State update: in place
Current MATE Restrictions¶
Item |
MATE |
|---|---|
|
✅ Supported on VK FP32 decode and VK MTP; negative entries are padding |
BF16 state backend |
✅ Supported on VK MTP ( |
KV backend |
❌ Not supported |
MTP ( |
✅ Supported for VK float32/bfloat16 state with K=V=128 |
|
✅ Supported on VK MTP; dtype must match |
|
✅ Supported on VK MTP |
Input Contract on the Active Path¶
On the currently supported MATE path:
q / k / v:float16orbfloat16A_log / dt_bias:float32a / b: same dtype asqstate:float32for single-token decode;float32orbfloat16for MTPintermediate_states_buffer: same dtype asstateoutput: optional, supportsfloat16/bfloat16/float32
Prefill¶
Prefill At a Glance¶
Area |
Status |
Notes |
|---|---|---|
Backend |
✅ Supported |
FlashInfer-aligned native MP31 prefill path on MUSA via |
Sequence Mode |
✅ Supported |
Varlen prefill with |
Head Layout |
✅ Supported |
|
Dtype (Q/K/V) |
✅ Supported |
|
Gate Inputs |
✅ Supported |
|
Gate Space |
✅ Supported |
|
Initial State |
✅ Supported |
Optional |
Final State Output |
✅ Supported |
|
Output Heads |
✅ Supported |
|
QK L2 Norm Option |
✅ Supported |
|
Strided Q/K/V |
✅ Supported |
Split-QKV views are supported when the last dimension is contiguous ( |
Prefill Shape Rules¶
Item |
Requirement |
|---|---|
|
3D tensors: |
Token count |
|
Q/K dim |
|
Head layout |
|
|
Required by public wrapper for varlen prefill |
|
Must be exactly |
Strides |
|
QK L2 norm |
If |
Current Kernel Constraints¶
Item |
Status |
Notes |
|---|---|---|
|
✅ Required |
Native MP31 kernel currently supports |
|
✅ Required |
Native MP31 kernel currently supports |
|
✅ Required |
Current native path is fixed to |
Device |
✅ Required |
MUSA on MP31 |
Not Supported Yet (Prefill)¶
Feature |
Status |
Notes |
|---|---|---|
Non-GQA/GVA head mapping |
❌ Not supported |
Only the two grouped layouts above are accepted |
Notes¶
Public API entry:
mate.gdn_prefill.chunk_gated_delta_rule.use_qk_l2norm_in_kernel=Trueis an in-place operation on the input Q and K tensors. Pass cloned tensors if the original unnormalized values are still needed after the call.Strided support is intended for fused/split QKV layouts such as a single physical
[tokens, qkv_dim]allocation split into Q, K, and V views. Arbitrary layouts with a non-contiguous last dimension are not supported.