Canary-Qwen-2.5B CoreML FP16

This repository contains a community CoreML conversion of the public NVIDIA Canary-Qwen-2.5B speech recognition model for Apple Silicon workflows.

This is not an official NVIDIA release. The conversion is derived from the public base model and keeps the model split into CoreML components:

  • encoder.mlpackage: FP16 speech encoder
  • projection.mlpackage: FP16 projection from encoder space to Qwen embedding space
  • canary_decoder_stateful.mlpackage: FP16 stateful autoregressive decoder with KV cache
  • canary_lm_head.mlpackage: FP16 LM head that maps decoder hidden states to logits

Base Model

  • Original model: nvidia/canary-qwen-2.5b
  • Original license: CC-BY-4.0
  • Original architecture: FastConformer encoder + Qwen decoder with projection and LoRA adaptation

Please review and comply with the original model card and license terms:

Included Artifacts

File Precision Purpose
encoder.mlpackage FP16 Speech encoder
projection.mlpackage FP16 Encoder-to-LLM projection
canary_decoder_stateful.mlpackage FP16 Stateful decoder with KV cache
canary_lm_head.mlpackage FP16 Decoder hidden states to logits

Notes

  • This repo contains model artifacts only.
  • The decoder is separated from the LM head.
  • The stateful decoder is intended for macOS 15 / iOS 18 era CoreML state support.
  • Long-audio chunking, prompt formatting, and transcript stitching live in the runtime layer and are not included here.

Attribution

This conversion is based on:

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phequals/canary-qwen-2.5b-coreml-fp16

Finetuned
Qwen/Qwen3-1.7B
Quantized
(3)
this model