Typhoon-OCR-7B — MLX q8

MLX-quantized port of typhoon-ai/typhoon-ocr-7b for native Apple Silicon inference. Highest-fidelity local option in the MegawizCo Typhoon-OCR collection — pick this when you can afford the RAM and latency.

Quantization: 8-bit affine, group size 64. Effective rate 9.112 bits/weight. Size on disk: 8.8 GB.

⚠️ Note: config.json patched

The upstream typhoon-ai/typhoon-ocr-7b config.json is missing the vision_config fields that mlx-vlm's Qwen2.5-VL loader requires. We patched the config by copying those fields verbatim from upstream Qwen/Qwen2.5-VL-7B-Instruct. See the q4 variant for the patch script.

Benchmark

Same 7-image internal smoke set as the q4 variant, Mac mini Apple Silicon, 2026-05-13:

Backend	HW CER median	HW CER max	Wall median	Generation TPS	Peak RAM
3b MLX q4	0.009	0.081	1.95 s	~107	~3.5 GB
3b MLX q8	0.000	0.081	2.34 s	~65	~5 GB
7b MLX q4	0.012	0.037	3.55 s	~56	~6 GB
7b MLX q8 (this)	0.012	0.028	4.66 s	~32	~9 GB

Read: lowest CER ceiling on this set (0.028 max). Pay 2× latency vs 3b q8 to get a ~3× lower worst-case CER (0.028 vs 0.081). Right call when accuracy on tricky cases matters more than throughput — e.g. legal documents, insurance claim escalation, Curator-marked high-stakes.

The test set is synthetic — real photographed handwriting will be harder.

Usage

uv pip install mlx-vlm

from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("MegawizCo/typhoon-ocr-7b-mlx-q8")
config = load_config("MegawizCo/typhoon-ocr-7b-mlx-q8")

prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text)  # 7b wraps in {"natural_text": "..."}

⚠️ Output uses {"natural_text": "..."} (3b uses {"text": "..."}).

Reproduce

Same config-patch step as q4, then convert with --q-bits 8. See the q4 README for the full command.

License & attribution

License: Apache 2.0 — inherited from upstream.
Base model: SCB 10X / Typhoon AI.
Config patch + quantization: MegawizCo (2026-05-13).

MegawizCo/typhoon-ocr-7b-mlx-q4 — faster 7b
MegawizCo/typhoon-ocr-3b-mlx-q4 — fastest, throughput-default
MegawizCo/typhoon-ocr-3b-mlx-q8 — 3b high-fidelity

Downloads last month: 107

Safetensors

Model size

3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for MegawizCo/typhoon-ocr-7b-mlx-q8

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

typhoon-ai/typhoon-ocr-7b

Quantized

(5)

this model

MegawizCo
/

typhoon-ocr-7b-mlx-q8