Welcome to MATE DocumentationΒΆ
MATE (MUSA AI Tensor Engine) is a high-performance operator library optimized for generative AI workloads on Moore Threads GPUs. Built on TVM-FFI, it delivers highly efficient Transformer and large language model (LLM) operator implementations, including Attention and GEMM.
MATE is specifically designed for framework developers and integration
engineers aiming to port existing CUDA-oriented pipelines to Moore Threads
GPUs with minimal code rewriting. To ease this transition, it features
CUDA-compatible Python wrappers that preserve familiar package surfaces. When a
compatible wrapper is unavailable, developers can directly use native
mate APIs.
To streamline deployment and troubleshooting, the library supports a wrapper-first integration path backed by robust runtime checks, comprehensive logging, and environment inspection tools.
Getting Started
Wrapper Quickstarts
Support & Compatibility
CLI & Diagnostics
API Reference