Welcome to MATE Documentation¶

GitHub

MATE (MUSA AI Tensor Engine) is a high-performance operator library optimized for generative AI workloads on Moore Threads GPUs. Built on TVM-FFI, it delivers highly efficient Transformer and large language model (LLM) operator implementations, including Attention and GEMM.

MATE is specifically designed for framework developers and integration engineers aiming to port existing CUDA-oriented pipelines to Moore Threads GPUs with minimal code rewriting. To ease this transition, it features CUDA-compatible Python wrappers that preserve familiar package surfaces. When a compatible wrapper is unavailable, developers can directly use native mate APIs.

To streamline deployment and troubleshooting, the library supports a wrapper-first integration path backed by robust runtime checks, comprehensive logging, and environment inspection tools.

Getting Started

Overview
Installing MATE
Release Notes

Wrapper Quickstarts

Wrappers
FlashAttention Wrapper
SageAttention Wrapper
FlashMLA Wrapper
FlashKDA Wrapper
DeepGEMM Wrapper

Deep Dive

Design and Architecture
Choosing Wrappers vs. Python APIs

Support & Compatibility

GDN Support Matrix
FlashAttention3 Forward Compatibility

CLI & Diagnostics

Diagnostic Overview
Command Line Interface
Logging
Environment Variables
Guard Allocator Debugging

API Reference

Python APIs
Attention
GEMM
HyperConnection
KDA