Granite-Docling-258M-GGUF
GGUF format of ibm-granite/granite-docling-258M, a multimodal document OCR model that converts document images to Docling format.
Converted with llama.cpp.
Files
| File | Quant | Size | Note |
|---|---|---|---|
granite-docling-258M-f16.gguf |
F16 | 317 MB | Full precision |
granite-docling-258M-q8_0.gguf |
Q8_0 | 170 MB | Recommended |
mmproj-granite-docling-258M-f16.gguf |
F16 | 182 MB | Vision encoder (required) |
Usage
CLI
llama-mtmd-cli \
--model granite-docling-258M-q8_0.gguf \
--mmproj mmproj-granite-docling-258M-f16.gguf \
--image document.png \
--n-predict 4096 --ctx-size 8192 --temp 0.0 \
-p "Convert this page to docling."
Server (OpenAI-compatible API)
llama-server \
-m granite-docling-258M-q8_0.gguf \
--mmproj mmproj-granite-docling-258M-f16.gguf \
--ctx-size 8192 --special --jinja \
--host 0.0.0.0 --port 8080
Benchmark (CPU only, Q8_0)
| CPU | Config | Long text (4096 tok) | Short text (50 tok) |
|---|---|---|---|
| EPYC 9654 (96C) | 192 inst x 1t | 1.73 img/s | 29.4 img/s |
| EPYC 9654 (16C) | 16 inst x 1t | 0.67 img/s | 8.68 img/s |
For this small model, 1 thread per instance with max instances = core count gives best throughput.
- Downloads last month
- 70
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
Model tree for padeoe/granite-docling-258M-GGUF
Base model
ibm-granite/granite-docling-258M