Macedonian OCR V7

Fine-tuned PaddleOCR PP-OCRv5 recognition model for Macedonian Cyrillic text in scanned books.

Results

Engine CER WER MK-Acc
V7 (this model) 0.45% ~2% 99.6%
Tesseract (mkd) 2.15% 6.3% 96.3%
V5 (previous) 2.21% 3.9% 99.2%

Usage

Training

  • Base model: PP-OCRv5 SVTR_LCNet with MultiHead (CTC + NRTR)
  • Dictionary: 170 characters โ€” full Macedonian Cyrillic, Latin, digits, punctuation
  • Training data: 583k synthetic line images rendered from real Macedonian book text
  • Detection tuning: thresh=0.25, box_thresh=0.30, unclip=2.2

License

Model weights: CC BY 4.0 โ€” free to use, modify, and redistribute with attribution.
Code: MIT

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support