Environment Variables¶

MATE reads configuration from the current process environment. Set variables before launching the Python process or CLI command that should observe them.

Use mate env to print the MATE-related variables visible to the current shell.

API Logging and Dumps¶

The logging and dumping configuration is read when mate.api_logging is imported.

Variable	Default	Meaning
`MATE_LOGLEVEL`	`0`	API logging level: `0`, `1`, `3`, `5`, or `10`
`MATE_LOGDEST`	`stdout`	Log destination: `stdout`, `stderr`, or a file path
`MATE_DUMP_DIR`	`mate_dumps`	Root directory for Level 10 tensor dumps
`MATE_DUMP_MAX_SIZE_GB`	`20`	Maximum total dump size in GB per process
`MATE_DUMP_MAX_COUNT`	`1000`	Maximum number of dumped calls per process
`MATE_DUMP_SAFETENSORS`	`0`	Save dumps as `safetensors` instead of `torch.save`
`MATE_DUMP_INCLUDE`	empty	Comma-separated `fnmatch` patterns to include for dumping
`MATE_DUMP_EXCLUDE`	empty	Comma-separated `fnmatch` patterns to exclude from dumping

Log-Level Meanings¶

0: Disabled.
1: Function names only.
3: Function names plus structured inputs and outputs, including tensor VA ranges.
5: Level 3 plus tensor statistics.
10: Level 5 plus on-disk tensor dumping for replay.

Note

MATE_LOGDEST supports %i in file paths. MATE replaces it with the current process ID.
Dumps produced with MATE_DUMP_SAFETENSORS=1 do not preserve original stride or non-contiguous layout information. Use the default torch.save format when replay must preserve strides.
Level 10 logging writes full API inputs and outputs to disk. Do not enable it for sensitive workloads unless the dump directory is appropriately protected.

JIT, AOT, and Cache¶

Variable	Default	Meaning
`MATE_MUSA_ARCH_LIST`	auto-detect visible devices	MUSA architecture list used by JIT/AOT workflows; accepts space-separated `major.minor` values such as `3.1` or `3.1 4.0`.
`MATE_WORKSPACE_BASE`	home directory	Base directory for the MATE cache workspace.
`MATE_DISABLE_JIT`	`0`	Disable runtime JIT and require matching AOT modules.
`MATE_JIT_VERBOSE`	`0`	Show verbose ninja output for runtime JIT builds.

Runtime loading prefers a matching AOT library when one exists. MATE_DISABLE_JIT=1 switches to AOT-only behavior and raises an error if no matching AOT module exists. MATE does not provide an environment variable to bypass AOT and force runtime JIT when a matching AOT module is present.

Set MATE_MUSA_ARCH_LIST explicitly for offline diagnostics when no MUSA device is visible:

MATE_MUSA_ARCH_LIST=3.1 mate module-status

Runtime Wrapper Controls¶

Variable	Default	Meaning
`MATE_DEEPGEMM_MK_ALIGNMENT`	`128`	Override the M-axis padding alignment returned by DeepGEMM wrapper `get_mk_alignment_for_contiguous_layout()`; supported values are `128` and `256`

Use MATE_DEEPGEMM_MK_ALIGNMENT when DeepGEMM-compatible contiguous grouped GEMM call sites need a non-default M-axis alignment. Callers should pad each expert segment and build m_indices using the same value returned by get_mk_alignment_for_contiguous_layout(). Set this variable before starting the Python process; the helper reads and caches the value on first use.

Compiler and Build Controls¶

Variable	Default	Meaning
`MATE_EXTRA_CFLAGS`	empty	Extra host compiler flags for JIT builds.
`MATE_EXTRA_MUSAFLAGS`	empty	Extra `mcc` flags for JIT builds.
`MATE_EXTRA_LDFLAGS`	empty	Extra linker flags for JIT builds.
`MATE_MCC`	auto-detected	Override the `mcc` compiler path used by JIT builds.

MATE JIT builds also honor common build-tool variables such as CXX for the host C++ compiler and MAX_JOBS for ninja parallelism.

Guard Allocator Debugging¶

Variable	Default	Meaning
`MATE_GUARD_ALLOCATOR_AUTO_INSTALL`	unset	Internal bootstrap flag used by `mate guard-run`
`MATE_GUARD_ALLOCATOR_MODE`	unset	Internal guard allocator mode for bootstrap and replay
`MATE_GUARD_ALLOCATOR_SYNC_ON_FREE`	unset	Internal flag controlling device sync before guarded frees
`MATE_GUARD_ALLOCATOR_LOG_ALLOCATIONS`	unset	Internal flag controlling guarded alloc/free logging

The MATE_GUARD_ALLOCATOR_* variables are internal bootstrap details and are usually set by MATE itself when preparing guarded child processes. Use mate guard-run or the Python mate.memory_debug.install_guard_allocator() API for normal guard allocator debugging. The guard allocator is host-only C++ and does not need MATE_MUSA_ARCH_LIST; that variable only controls MATE JIT/AOT modules that compile MUSA kernels.

Diagnostic and Test-Only Variables¶

Variable	Default	Meaning
`MATE_PYTEST_GUARD_ALLOC`	`0`	Enable the guarded allocator by default for MATE repository pytest runs
`MATE_PYTEST_GUARD_MODE`	`tail`	Default MATE repository pytest guard allocator mode: `tail` or `head`
`MATE_PYTEST_GUARD_LOG_ALLOCATIONS`	`0`	Log guarded alloc/free events during MATE repository pytest runs
`MATE_DRY_RUN`	`0`	Internal test/diagnostic mode used by MATE tests; not intended as a normal user runtime setting
`MATE_PYTEST_SHARD_TOTAL`	`1`	Number of shards to dispatch all the tests to
`MATE_PYTEST_SHARD_INDEX`	`0`	Index of current shard
`MATE_PYTEST_SHARD_MODE`	`file`	Shard dispatch mode. Currently, only `file` mode is supported

When MATE_PYTEST_SHARD_TOTAL > 1, test_fmha.py tests will dominate the last shard.

The MATE_PYTEST_GUARD_* variables are consumed by MATE’s in-repository pytest conftest.py; they are not installed as a general pytest plugin for external projects. Prefer mate guard-run for user-facing guarded allocator debug sessions.