Python APIs¶
This section covers the direct MATE Python APIs. Use these interfaces only when your framework or model architecture requires direct symbol-level integration.
Note
Check wrapper availability first. The wrapper-first workflow preserves more of the upstream package behavior and usually requires less code change.
Use direct MATE Python APIs when:
no wrapper matches your framework’s package surface
a wrapper exists but does not cover the feature you need
you need symbol-level control over a specific operator path
Supported API Entrypoints¶
Attention¶
Optimized entrypoints for FlashAttention, varlen, KV-cache, scheduler metadata, and MLA-related attention paths.
mate.flash_attn_combinemate.flash_attn_varlen_funcmate.flash_attn_with_kvcachemate.get_scheduler_metadatamate.get_mla_metadatamate.flash_mla_with_kvcache
SageAttention¶
Low-level pre-quantized SageAttention entrypoints for direct integration when
the sageattention wrapper is not the right surface.
mate.sage_attn_quantizedmate.sage_attn_quantized_with_kvcache
GEMM¶
Low-precision GEMM entrypoints, including MoE GEMM, batched GEMM, and DeepGEMM-specific metadata / logits helpers.
mate.gemm.ragged_m_moe_gemm_8bitmate.gemm.ragged_k_moe_gemm_8bitmate.gemm.masked_moe_gemm_8bitmate.gemm.bmm_fp16mate.gemm.bmm_fp8mate.gemm.gemm_fp8_nt_groupwisemate.deep_gemm.get_paged_mqa_logits_metadatamate.deep_gemm.fp8_paged_mqa_logits
Hyperconnection¶
Direct Hyperconnection APIs exposed at the
top-level mate package and through mate.hyperconnection.
mate.mhc_premate.mhc_prenorm_gemm_sqrsummate.mhc_pre_big_fuse
GDN¶
Direct Gated Delta Network APIs for decode and prefill. See the GDN support page for support details and workflow guidance.
mate.gated_delta_rule_decodemate.gdn_prefill.chunk_gated_delta_rule
KDA¶
Direct KDA entrypoints for fused chunked KDA when the
flash_kda wrapper is not the right integration surface.
mate.chunk_kdamate.kda.chunk_kda
MoE Routing & Gating¶
Direct MoE routing and gating entrypoints for workflows that need MATE’s native router path instead of a wrapper-level integration.
mate.hash_topkmate.moe_fused_gate