Python APIs

This section covers the direct MATE Python APIs. Use these interfaces only when your framework or model architecture requires direct symbol-level integration.

Note

Check wrapper availability first. The wrapper-first workflow preserves more of the upstream package behavior and usually requires less code change.

Use direct MATE Python APIs when:

  • no wrapper matches your framework’s package surface

  • a wrapper exists but does not cover the feature you need

  • you need symbol-level control over a specific operator path

Supported API Entrypoints

Attention

Optimized entrypoints for FlashAttention, varlen, KV-cache, scheduler metadata, and MLA-related attention paths.

  • mate.flash_attn_combine

  • mate.flash_attn_varlen_func

  • mate.flash_attn_with_kvcache

  • mate.get_scheduler_metadata

  • mate.get_mla_metadata

  • mate.flash_mla_with_kvcache

SageAttention

Low-level pre-quantized SageAttention entrypoints for direct integration when the sageattention wrapper is not the right surface.

  • mate.sage_attn_quantized

  • mate.sage_attn_quantized_with_kvcache

GEMM

Low-precision GEMM entrypoints, including MoE GEMM, batched GEMM, and DeepGEMM-specific metadata / logits helpers.

  • mate.gemm.ragged_m_moe_gemm_8bit

  • mate.gemm.ragged_k_moe_gemm_8bit

  • mate.gemm.masked_moe_gemm_8bit

  • mate.gemm.bmm_fp16

  • mate.gemm.bmm_fp8

  • mate.gemm.gemm_fp8_nt_groupwise

  • mate.deep_gemm.get_paged_mqa_logits_metadata

  • mate.deep_gemm.fp8_paged_mqa_logits

Hyperconnection

Direct Hyperconnection APIs exposed at the top-level mate package and through mate.hyperconnection.

  • mate.mhc_pre

  • mate.mhc_prenorm_gemm_sqrsum

  • mate.mhc_pre_big_fuse

GDN

Direct Gated Delta Network APIs for decode and prefill. See the GDN support page for support details and workflow guidance.

  • mate.gated_delta_rule_decode

  • mate.gdn_prefill.chunk_gated_delta_rule

KDA

Direct KDA entrypoints for fused chunked KDA when the flash_kda wrapper is not the right integration surface.

  • mate.chunk_kda

  • mate.kda.chunk_kda

MoE Routing & Gating

Direct MoE routing and gating entrypoints for workflows that need MATE’s native router path instead of a wrapper-level integration.

  • mate.hash_topk

  • mate.moe_fused_gate