Voxtral Realtime
Paper
• 2602.11298 • Published
• 17
Pre-exported ExecuTorch .pte files
for Voxtral-Mini-4B-Realtime-2602
with Metal backend (Apple GPU). Supports both offline and streaming
transcription with GPU acceleration on macOS Apple Silicon.
For the XNNPACK (CPU) variant, see Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK.
Install ExecuTorch from source with Metal backend support:
git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh
Build the runner with Metal:
cd ~/executorch && make voxtral_realtime-metal
Install libomp (required by the AOTInductor kernels):
brew install libomp
export DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib
pip install huggingface_hub
huggingface-cli download younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal --local-dir ~/voxtral_metal
All run commands require
DYLD_LIBRARY_PATHset as shown in the installation step.
DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
--model_path ~/voxtral_metal/model-metal-fpa4w.pte \
--tokenizer_path ~/voxtral_metal/tekken.json \
--preprocessor_path ~/voxtral_metal/preprocessor.pte \
--audio_path ~/voxtral_metal/poem.wav
DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
--model_path ~/voxtral_metal/model-metal-fpa4w-streaming.pte \
--tokenizer_path ~/voxtral_metal/tekken.json \
--preprocessor_path ~/voxtral_metal/preprocessor-streaming.pte \
--audio_path ~/voxtral_metal/poem.wav \
--streaming
ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f f32le -nostats -loglevel error pipe:1 | \
DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
--model_path ~/voxtral_metal/model-metal-fpa4w-streaming.pte \
--tokenizer_path ~/voxtral_metal/tekken.json \
--preprocessor_path ~/voxtral_metal/preprocessor-streaming.pte \
--mic
| Mode | TTFT | Gen Tokens | Gen Rate (tok/s) | Total Inference |
|---|---|---|---|---|
| Offline | 2.121s | 377 | 40.82 | 11.356s |
| Streaming | 0.304s | 260 | 16.18 | 16.374s |
These models were exported with:
# Offline
python examples/models/voxtral_realtime/export_voxtral_rt.py \
--model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
--backend metal \
--output-dir ./voxtral_rt_metal_offline \
--qlinear-encoder fpa4w \
--qlinear fpa4w
# Streaming
python examples/models/voxtral_realtime/export_voxtral_rt.py \
--model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
--backend metal \
--streaming \
--output-dir ./voxtral_rt_metal_streaming \
--qlinear-encoder fpa4w \
--qlinear fpa4w
Library not loaded: @rpath/libc++.1.dylib — Add /usr/lib to DYLD_LIBRARY_PATH:
export DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib
Library not loaded: libomp.dylib — Install OpenMP via Homebrew:
brew install libomp
Audio format — Input must be 16kHz mono WAV. Convert with:
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
Base model
mistralai/Ministral-3-3B-Base-2512