Python APIs =========== This section covers the direct MATE Python APIs. Use these interfaces only when your framework or model architecture requires direct symbol-level integration. .. note:: Check wrapper availability first. The wrapper-first workflow preserves more of the upstream package behavior and usually requires less code change. Use direct MATE Python APIs when: - no wrapper matches your framework's package surface - a wrapper exists but does not cover the feature you need - you need symbol-level control over a specific operator path Supported API Entrypoints ------------------------- Attention --------- Optimized entrypoints for FlashAttention, varlen, KV-cache, scheduler metadata, and MLA-related :doc:`attention ` paths. - ``mate.flash_attn_combine`` - ``mate.flash_attn_varlen_func`` - ``mate.flash_attn_with_kvcache`` - ``mate.get_scheduler_metadata`` - ``mate.get_mla_metadata`` - ``mate.flash_mla_with_kvcache`` SageAttention ------------- Low-level pre-quantized SageAttention entrypoints for direct integration when the ``sageattention`` wrapper is not the right surface. - ``mate.sage_attn_quantized`` - ``mate.sage_attn_quantized_with_kvcache`` GEMM ---- Low-precision :doc:`GEMM ` entrypoints, including MoE GEMM, batched GEMM, and DeepGEMM-specific metadata / logits helpers. - ``mate.gemm.ragged_m_moe_gemm_8bit`` - ``mate.gemm.ragged_k_moe_gemm_8bit`` - ``mate.gemm.masked_moe_gemm_8bit`` - ``mate.gemm.bmm_fp16`` - ``mate.gemm.bmm_fp8`` - ``mate.gemm.gemm_fp8_nt_groupwise`` - ``mate.deep_gemm.get_paged_mqa_logits_metadata`` - ``mate.deep_gemm.fp8_paged_mqa_logits`` Hyperconnection --------------- Direct :doc:`Hyperconnection ` APIs exposed at the top-level ``mate`` package and through ``mate.hyperconnection``. - ``mate.mhc_pre`` - ``mate.mhc_prenorm_gemm_sqrsum`` - ``mate.mhc_pre_big_fuse`` GDN --- Direct Gated Delta Network APIs for decode and prefill. See the :doc:`GDN support page ` for support details and workflow guidance. - ``mate.gated_delta_rule_decode`` - ``mate.gdn_prefill.chunk_gated_delta_rule`` KDA --- Direct :doc:`KDA ` entrypoints for fused chunked KDA when the ``flash_kda`` wrapper is not the right integration surface. - ``mate.chunk_kda`` - ``mate.kda.chunk_kda`` MoE Routing & Gating -------------------- Direct MoE routing and gating entrypoints for workflows that need MATE's native router path instead of a wrapper-level integration. - ``mate.hash_topk`` - ``mate.moe_fused_gate``