FlashMLA Compatibility Wrapper (flash_mla)¶

flash_mla is a compatibility wrapper package that preserves the official flash_mla package name and import path while running on MUSA through MATE Multi-head Latent Attention (MLA) operators.

Overview¶

This wrapper is designed for projects that already target the FlashMLA Python API. It lets you run MLA dense decode, sparse decode, and sparse prefill workloads on MUSA with minimal integration changes.

The current compatibility scope includes FlashMLASchedMeta, get_mla_metadata, flash_mla_with_kvcache, and flash_mla_sparse_fwd.

Package and import¶

Package name: flash_mla
Import path: flash_mla
Runtime backend: MATE MLA operators on MUSA

Requirements¶

Before using this wrapper, make sure the following are available:

MATE is installed and importable.
TorchMUSA is installed and the MUSA runtime environment is configured.
The target workload is configured to run on MUSA devices.

Build¶

Build a wheel from the wrappers/FlashMLA directory:

python -m build --wheel

The generated wheel will be placed under:

dist/

Installation¶

Install from source:

pip install --no-build-isolation -e .

Install a built wheel:

pip install dist/flash_mla-*.whl

Import¶

Import the package directly:

import flash_mla

Import individual APIs:

from flash_mla import (
    FlashMLASchedMeta,
    get_mla_metadata,
    flash_mla_with_kvcache,
    flash_mla_sparse_fwd,
)

Behavior¶

get_mla_metadata(...) follows the current upstream FlashMLA Python interface and returns (FlashMLASchedMeta(), None).
The real scheduler tensors are initialized lazily on the first flash_mla_with_kvcache(...) call and cached inside FlashMLASchedMeta.
Reusing the same FlashMLASchedMeta requires the same decode configuration across calls.
flash_mla_with_kvcache(...) is the dense/sparse decode entry. The wrapper validates FlashMLASchedMeta, lazily materializes the real scheduler with mate.flashmla.get_mla_metadata(...), and then forwards to mate.flashmla.flash_mla_with_kvcache(...).
flash_mla_sparse_fwd(...) is the sparse MLA prefill entry.

Notes¶

This wrapper keeps the official FlashMLA import surface, but execution is provided by MATE on MUSA.
For the authoritative MLA operator behavior, refer to mate.flashmla.