Twkeed Vision (ุชูˆูƒูŠุฏ ู„ู„ุฑุคูŠุฉ)

Arabic Vision-Language Model for OCR and Document Understanding, based on Qwen3-VL-4B.

Model Details

  • Base Model: Qwen3-VL-4B-Instruct-4bit
  • Fine-tuned for: Arabic OCR Text Understanding, Document Understanding
  • Framework: MLX (Apple Silicon optimized)
  • Type: LoRA Adapters
  • Parameters: 4B base + LoRA adapters

Identity

When asked "ู…ู† ุฃู†ุชุŸ" (Who are you?), the model responds:

ุฃู†ุง ุชูˆูƒูŠุฏ ู„ู„ุฑุคูŠุฉุŒ ู…ุณุงุนุฏ ุฐูƒูŠ ู…ุชุฎุตุต ููŠ ู‚ุฑุงุกุฉ ุงู„ู†ุตูˆุต ุงู„ุนุฑุจูŠุฉ ู…ู† ุงู„ุตูˆุฑ ูˆุงู„ู…ุณุชู†ุฏุงุช

Capabilities

  • Arabic OCR Understanding: Understand and process Arabic OCR text
  • Document Understanding: Extract information from Arabic documents
  • Receipt/Invoice Processing: Parse Arabic receipts and invoices
  • ID Recognition: Read Saudi IDs and official documents
  • Text Recognition: Handle various Arabic fonts and text styles
  • 32-Language OCR: Built-in support for 32 languages including Arabic

Usage

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.trainer import get_peft_model

# Load base model
model, processor = load("mlx-community/Qwen3-VL-4B-Instruct-4bit")

# Apply LoRA structure
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj"]
model = get_peft_model(model, linear_layers=target_modules, rank=16, alpha=2.0, dropout=0.05, freeze=True)

# Load adapters (download from this repo)
adapter_weights = mx.load("path/to/adapters.safetensors")
# Strip language_model prefix
stripped_weights = {k.replace('language_model.', ''): v for k, v in adapter_weights.items()}
model.language_model.load_weights(list(stripped_weights.items()), strict=False)

# Generate with Arabic prompt
prompt = "<|im_start|>user\nู…ู† ุฃู†ุชุŸ<|im_end|>\n<|im_start|>assistant\n"
result = generate(model, processor, prompt, max_tokens=256)
print(result.text)

Training

Fine-tuned using:

  • Hardware: Mac Studio M3 Ultra 96GB
  • Framework: mlx-vlm
  • Method: LoRA (Low-Rank Adaptation)
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Rank: 16
  • Alpha: 32
  • Data: Arabic OCR datasets, document understanding examples
  • Epochs: 3
  • Steps: 2000+
  • Final Loss: ~0.09

Files

  • adapters.safetensors - LoRA adapter weights (47MB)
  • adapter_config.json - LoRA configuration

Qwen3-VL-4B Features

  • DeepStack ViT: Enhanced vision encoder
  • 32-Language OCR: Built-in multilingual OCR support
  • Improved Arabic: Better Arabic text handling than Qwen2.5

License

Apache 2.0

Acknowledgments

  • Base model: Qwen Team (Alibaba)
  • MLX framework: Apple
  • Training framework: mlx-vlm
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for twkeed-sa/twkeed-vision

Adapter
(1)
this model