FlashMLA Compatibility Wrapper (flash_mla)¶
flash_mla is a compatibility wrapper package that preserves the official
flash_mla package name and import path while running on MUSA through MATE
Multi-head Latent Attention (MLA) operators.
Overview¶
This wrapper is designed for projects that already target the FlashMLA Python API. It lets you run MLA dense decode, sparse decode, and sparse prefill workloads on MUSA with minimal integration changes.
The current compatibility scope includes FlashMLASchedMeta,
get_mla_metadata, flash_mla_with_kvcache, and flash_mla_sparse_fwd.
Package and import¶
Package name:
flash_mlaImport path:
flash_mlaRuntime backend: MATE MLA operators on MUSA
Requirements¶
Before using this wrapper, make sure the following are available:
MATE is installed and importable.
TorchMUSA is installed and the MUSA runtime environment is configured.
The target workload is configured to run on MUSA devices.
Build¶
Build a wheel from the wrappers/FlashMLA directory:
python -m build --wheel
The generated wheel will be placed under:
dist/
Installation¶
Install from source:
pip install --no-build-isolation -e .
Install a built wheel:
pip install dist/flash_mla-*.whl
Import¶
Import the package directly:
import flash_mla
Import individual APIs:
from flash_mla import (
FlashMLASchedMeta,
get_mla_metadata,
flash_mla_with_kvcache,
flash_mla_sparse_fwd,
)
Behavior¶
get_mla_metadata(...)follows the current upstream FlashMLA Python interface and returns(FlashMLASchedMeta(), None).The real scheduler tensors are initialized lazily on the first
flash_mla_with_kvcache(...)call and cached insideFlashMLASchedMeta.Reusing the same
FlashMLASchedMetarequires the same decode configuration across calls.flash_mla_with_kvcache(...)is the dense/sparse decode entry. The wrapper validatesFlashMLASchedMeta, lazily materializes the real scheduler withmate.flashmla.get_mla_metadata(...), and then forwards tomate.flashmla.flash_mla_with_kvcache(...).flash_mla_sparse_fwd(...)is the sparse MLA prefill entry.
Notes¶
This wrapper keeps the official FlashMLA import surface, but execution is provided by MATE on MUSA.
For the authoritative MLA operator behavior, refer to
mate.flashmla.