Canary-Qwen-2.5B CoreML FP16

This repository contains a community CoreML conversion of the public NVIDIA Canary-Qwen-2.5B speech recognition model for Apple Silicon workflows.

This is not an official NVIDIA release. The conversion is derived from the public base model and keeps the model split into CoreML components:

encoder.mlpackage: FP16 speech encoder
projection.mlpackage: FP16 projection from encoder space to Qwen embedding space
canary_decoder_stateful.mlpackage: FP16 stateful autoregressive decoder with KV cache
canary_lm_head.mlpackage: FP16 LM head that maps decoder hidden states to logits

Base Model

Original model: nvidia/canary-qwen-2.5b
Original license: CC-BY-4.0
Original architecture: FastConformer encoder + Qwen decoder with projection and LoRA adaptation

Please review and comply with the original model card and license terms:

File	Precision	Purpose
`encoder.mlpackage`	FP16	Speech encoder
`projection.mlpackage`	FP16	Encoder-to-LLM projection
`canary_decoder_stateful.mlpackage`	FP16	Stateful decoder with KV cache
`canary_lm_head.mlpackage`	FP16	Decoder hidden states to logits

This repo contains model artifacts only.
The decoder is separated from the LM head.
The stateful decoder is intended for macOS 15 / iOS 18 era CoreML state support.
Long-audio chunking, prompt formatting, and transcript stitching live in the runtime layer and are not included here.

This conversion is based on:

Base model

Finetuned

Finetuned

Quantized

(3)

this model