Typhoon-OCR-7B — MLX q8

MLX-quantized port of typhoon-ai/typhoon-ocr-7b for native Apple Silicon inference. Highest-fidelity local option in the MegawizCo Typhoon-OCR collection — pick this when you can afford the RAM and latency.

Quantization: 8-bit affine, group size 64. Effective rate 9.112 bits/weight. Size on disk: 8.8 GB.

⚠️ Note: config.json patched

The upstream typhoon-ai/typhoon-ocr-7b config.json is missing the vision_config fields that mlx-vlm's Qwen2.5-VL loader requires. We patched the config by copying those fields verbatim from upstream Qwen/Qwen2.5-VL-7B-Instruct. See the q4 variant for the patch script.

Benchmark

Same 7-image internal smoke set as the q4 variant, Mac mini Apple Silicon, 2026-05-13:

Backend HW CER median HW CER max Wall median Generation TPS Peak RAM
3b MLX q4 0.009 0.081 1.95 s ~107 ~3.5 GB
3b MLX q8 0.000 0.081 2.34 s ~65 ~5 GB
7b MLX q4 0.012 0.037 3.55 s ~56 ~6 GB
7b MLX q8 (this) 0.012 0.028 4.66 s ~32 ~9 GB

Read: lowest CER ceiling on this set (0.028 max). Pay 2× latency vs 3b q8 to get a ~3× lower worst-case CER (0.028 vs 0.081). Right call when accuracy on tricky cases matters more than throughput — e.g. legal documents, insurance claim escalation, Curator-marked high-stakes.

The test set is synthetic — real photographed handwriting will be harder.

Usage

uv pip install mlx-vlm
from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("MegawizCo/typhoon-ocr-7b-mlx-q8")
config = load_config("MegawizCo/typhoon-ocr-7b-mlx-q8")

prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text)  # 7b wraps in {"natural_text": "..."}

⚠️ Output uses {"natural_text": "..."} (3b uses {"text": "..."}).

Reproduce

Same config-patch step as q4, then convert with --q-bits 8. See the q4 README for the full command.

License & attribution

Related

Downloads last month
107
Safetensors
Model size
3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MegawizCo/typhoon-ocr-7b-mlx-q8

Quantized
(5)
this model