Macedonian OCR V7
Fine-tuned PaddleOCR PP-OCRv5 recognition model for Macedonian Cyrillic text in scanned books.
Results
| Engine | CER | WER | MK-Acc |
|---|---|---|---|
| V7 (this model) | 0.45% | ~2% | 99.6% |
| Tesseract (mkd) | 2.15% | 6.3% | 96.3% |
| V5 (previous) | 2.21% | 3.9% | 99.2% |
Usage
Training
- Base model: PP-OCRv5 SVTR_LCNet with MultiHead (CTC + NRTR)
- Dictionary: 170 characters โ full Macedonian Cyrillic, Latin, digits, punctuation
- Training data: 583k synthetic line images rendered from real Macedonian book text
- Detection tuning: thresh=0.25, box_thresh=0.30, unclip=2.2
License
Model weights: CC BY 4.0 โ free to use, modify, and redistribute with attribution.
Code: MIT
Links
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support