Canary-Qwen-2.5B CoreML FP16
This repository contains a community CoreML conversion of the public NVIDIA Canary-Qwen-2.5B speech recognition model for Apple Silicon workflows.
This is not an official NVIDIA release. The conversion is derived from the public base model and keeps the model split into CoreML components:
encoder.mlpackage: FP16 speech encoderprojection.mlpackage: FP16 projection from encoder space to Qwen embedding spacecanary_decoder_stateful.mlpackage: FP16 stateful autoregressive decoder with KV cachecanary_lm_head.mlpackage: FP16 LM head that maps decoder hidden states to logits
Base Model
- Original model: nvidia/canary-qwen-2.5b
- Original license:
CC-BY-4.0 - Original architecture: FastConformer encoder + Qwen decoder with projection and LoRA adaptation
Please review and comply with the original model card and license terms:
Included Artifacts
| File | Precision | Purpose |
|---|---|---|
encoder.mlpackage |
FP16 | Speech encoder |
projection.mlpackage |
FP16 | Encoder-to-LLM projection |
canary_decoder_stateful.mlpackage |
FP16 | Stateful decoder with KV cache |
canary_lm_head.mlpackage |
FP16 | Decoder hidden states to logits |
Notes
- This repo contains model artifacts only.
- The decoder is separated from the LM head.
- The stateful decoder is intended for macOS 15 / iOS 18 era CoreML state support.
- Long-audio chunking, prompt formatting, and transcript stitching live in the runtime layer and are not included here.
Attribution
This conversion is based on:
- NVIDIA Canary-Qwen-2.5B
- Qwen/Qwen3-1.7B
- NVIDIA NeMo / SALM architecture described in the original model card
- Downloads last month
- 36