Welcome to MATE Documentation ============================= `GitHub `_ MATE (**M**\USA **A**\I **T**\ensor **E**\ngine) is a high-performance operator library optimized for generative AI workloads on Moore Threads GPUs. Built on TVM-FFI, it delivers highly efficient Transformer and large language model (LLM) operator implementations, including Attention and GEMM. MATE is specifically designed for framework developers and integration engineers aiming to port existing CUDA-oriented pipelines to Moore Threads GPUs with minimal code rewriting. To ease this transition, it features CUDA-compatible Python wrappers that preserve familiar package surfaces. When a compatible wrapper is unavailable, developers can directly use native ``mate`` APIs. To streamline deployment and troubleshooting, the library supports a wrapper-first integration path backed by robust runtime checks, comprehensive logging, and environment inspection tools. .. toctree:: :maxdepth: 1 :caption: Getting Started Overview Installing MATE Release Notes .. toctree:: :maxdepth: 2 :titlesonly: :caption: Wrapper Quickstarts Wrappers FlashAttention Wrapper SageAttention Wrapper FlashMLA Wrapper FlashKDA Wrapper DeepGEMM Wrapper .. toctree:: :maxdepth: 1 :caption: Deep Dive Design and Architecture Choosing Wrappers vs. Python APIs .. toctree:: :maxdepth: 1 :caption: Support & Compatibility GDN Support Matrix FlashAttention3 Forward Compatibility .. toctree:: :maxdepth: 1 :titlesonly: :caption: CLI & Diagnostics Diagnostic Overview Command Line Interface Logging Environment Variables Guard Allocator Debugging .. toctree:: :maxdepth: 1 :caption: API Reference Python APIs Attention GEMM HyperConnection KDA