FlashKDA Compatibility Wrapper (flash_kda)¶
flash_kda is a compatibility wrapper package that preserves the official
flash_kda package name and import path while running on MUSA through MATE
Kimi Delta Attention (KDA) operators.
Overview¶
This wrapper is designed for projects that already target the FlashKDA Python
API. It keeps the public flash_kda import surface and forwards execution to
mate.kda.chunk_kda.
The current compatibility scope includes:
flash_kda.fwdflash_kda.get_workspace_size
Package and import¶
Package name:
flash_kdaImport path:
flash_kdaRuntime backend: MATE KDA operators on MUSA
Requirements¶
Before using this wrapper, make sure the following are available:
MATE is installed and importable.
TorchMUSA is installed and the MUSA runtime environment is configured.
The target workload is configured to run on MUSA devices.
Use MUSA Toolkit / MTCC 5.1.0 or newer to build the current fused chunk KDA path. The 4.3.6 toolchain may fail to compile these kernels.
Build¶
Build a wheel from the wrappers/FlashKDA directory:
python -m build --wheel
The generated wheel will be placed under:
dist/
Installation¶
Install from source:
pip install --no-build-isolation -e .
Install a built wheel:
pip install dist/flash_kda-*.whl
Import¶
Import the package directly:
import flash_kda
Import individual APIs:
from flash_kda import fwd, get_workspace_size
Behavior¶
fwd(...)preserves the FlashKDA Python call signature and forwards tomate.kda.chunk_kda(...)withuse_qk_l2norm_in_kernel=True.outandfinal_stateare treated as preallocated output buffers and are written in place, matching the official package surface.get_workspace_size(...)is preserved for compatibility and always returns0. The wrapper does not allocate or consume an explicit workspace tensor because MATE manages the kernel internals itself.cu_seqlensfollows the MATE runtime behavior on MUSA and accepts bothtorch.int32andtorch.int64.
Notes¶
This wrapper keeps the official FlashKDA import surface, but execution is provided by MATE on MUSA.
For the authoritative operator behavior, refer to
mate.kda.chunk_kda.