Instructions to use MegawizCo/typhoon-ocr-7b-mlx-q8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use MegawizCo/typhoon-ocr-7b-mlx-q8 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("MegawizCo/typhoon-ocr-7b-mlx-q8") config = load_config("MegawizCo/typhoon-ocr-7b-mlx-q8") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Typhoon-OCR-7B — MLX q8
MLX-quantized port of typhoon-ai/typhoon-ocr-7b for native Apple Silicon inference. Highest-fidelity local option in the MegawizCo Typhoon-OCR collection — pick this when you can afford the RAM and latency.
Quantization: 8-bit affine, group size 64. Effective rate 9.112 bits/weight. Size on disk: 8.8 GB.
⚠️ Note: config.json patched
The upstream typhoon-ai/typhoon-ocr-7b config.json is missing the vision_config fields that mlx-vlm's Qwen2.5-VL loader requires. We patched the config by copying those fields verbatim from upstream Qwen/Qwen2.5-VL-7B-Instruct. See the q4 variant for the patch script.
Benchmark
Same 7-image internal smoke set as the q4 variant, Mac mini Apple Silicon, 2026-05-13:
| Backend | HW CER median | HW CER max | Wall median | Generation TPS | Peak RAM |
|---|---|---|---|---|---|
| 3b MLX q4 | 0.009 | 0.081 | 1.95 s | ~107 | ~3.5 GB |
| 3b MLX q8 | 0.000 | 0.081 | 2.34 s | ~65 | ~5 GB |
| 7b MLX q4 | 0.012 | 0.037 | 3.55 s | ~56 | ~6 GB |
| 7b MLX q8 (this) | 0.012 | 0.028 | 4.66 s | ~32 | ~9 GB |
Read: lowest CER ceiling on this set (0.028 max). Pay 2× latency vs 3b q8 to get a ~3× lower worst-case CER (0.028 vs 0.081). Right call when accuracy on tricky cases matters more than throughput — e.g. legal documents, insurance claim escalation, Curator-marked high-stakes.
The test set is synthetic — real photographed handwriting will be harder.
Usage
uv pip install mlx-vlm
from mlx_vlm import generate, load
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model, processor = load("MegawizCo/typhoon-ocr-7b-mlx-q8")
config = load_config("MegawizCo/typhoon-ocr-7b-mlx-q8")
prompt = apply_chat_template(processor, config, "Extract all text from this image.", num_images=1)
out = generate(model, processor, prompt, image=["prescription.png"], max_tokens=512)
print(out.text) # 7b wraps in {"natural_text": "..."}
⚠️ Output uses {"natural_text": "..."} (3b uses {"text": "..."}).
Reproduce
Same config-patch step as q4, then convert with --q-bits 8. See the q4 README for the full command.
License & attribution
- License: Apache 2.0 — inherited from upstream.
- Base model: SCB 10X / Typhoon AI.
- Config patch + quantization: MegawizCo (2026-05-13).
Related
MegawizCo/typhoon-ocr-7b-mlx-q4— faster 7bMegawizCo/typhoon-ocr-3b-mlx-q4— fastest, throughput-defaultMegawizCo/typhoon-ocr-3b-mlx-q8— 3b high-fidelity
- Downloads last month
- 107
8-bit