Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal

Pre-exported ExecuTorch .pte files for Voxtral-Mini-4B-Realtime-2602 with Metal backend (Apple GPU). Supports both offline and streaming transcription with GPU acceleration on macOS Apple Silicon.

For the XNNPACK (CPU) variant, see Voxtral-Mini-4B-Realtime-2602-ExecuTorch-XNNPACK.

Installation

Install ExecuTorch from source with Metal backend support:

git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh

Build the runner with Metal:

cd ~/executorch && make voxtral_realtime-metal

Install libomp (required by the AOTInductor kernels):

brew install libomp
export DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib

Download

pip install huggingface_hub
huggingface-cli download younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal --local-dir ~/voxtral_metal

Run

All run commands require DYLD_LIBRARY_PATH set as shown in the installation step.

Offline transcription

DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
  cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_metal/model-metal-fpa4w.pte \
    --tokenizer_path ~/voxtral_metal/tekken.json \
    --preprocessor_path ~/voxtral_metal/preprocessor.pte \
    --audio_path ~/voxtral_metal/poem.wav

Streaming transcription (from file)

DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
  cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_metal/model-metal-fpa4w-streaming.pte \
    --tokenizer_path ~/voxtral_metal/tekken.json \
    --preprocessor_path ~/voxtral_metal/preprocessor-streaming.pte \
    --audio_path ~/voxtral_metal/poem.wav \
    --streaming

Live microphone (macOS)

ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f f32le -nostats -loglevel error pipe:1 | \
  DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
  cmake-out/examples/models/voxtral_realtime/voxtral_realtime_runner \
    --model_path ~/voxtral_metal/model-metal-fpa4w-streaming.pte \
    --tokenizer_path ~/voxtral_metal/tekken.json \
    --preprocessor_path ~/voxtral_metal/preprocessor-streaming.pte \
    --mic

Performance (Apple Silicon Mac, 20s audio)

Mode TTFT Gen Tokens Gen Rate (tok/s) Total Inference
Offline 2.121s 377 40.82 11.356s
Streaming 0.304s 260 16.18 16.374s

Export Commands

These models were exported with:

# Offline
python examples/models/voxtral_realtime/export_voxtral_rt.py \
    --model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
    --backend metal \
    --output-dir ./voxtral_rt_metal_offline \
    --qlinear-encoder fpa4w \
    --qlinear fpa4w

# Streaming
python examples/models/voxtral_realtime/export_voxtral_rt.py \
    --model-path ~/models/Voxtral-Mini-4B-Realtime-2602 \
    --backend metal \
    --streaming \
    --output-dir ./voxtral_rt_metal_streaming \
    --qlinear-encoder fpa4w \
    --qlinear fpa4w

Troubleshooting

  • Library not loaded: @rpath/libc++.1.dylib — Add /usr/lib to DYLD_LIBRARY_PATH:

    export DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib
    
  • Library not loaded: libomp.dylib — Install OpenMP via Homebrew:

    brew install libomp
    
  • Audio format — Input must be 16kHz mono WAV. Convert with:

    ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
    

More Info

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal

Paper for younghan-meta/Voxtral-Mini-4B-Realtime-2602-ExecuTorch-Metal