Macedonian OCR V7

Fine-tuned PaddleOCR PP-OCRv5 recognition model for Macedonian Cyrillic text in scanned books.

Results

Engine	CER	WER	MK-Acc
V7 (this model)	0.45%	~2%	99.6%
Tesseract (mkd)	2.15%	6.3%	96.3%
V5 (previous)	2.21%	3.9%	99.2%

Base model: PP-OCRv5 SVTR_LCNet with MultiHead (CTC + NRTR)
Dictionary: 170 characters — full Macedonian Cyrillic, Latin, digits, punctuation
Training data: 583k synthetic line images rendered from real Macedonian book text
Detection tuning: thresh=0.25, box_thresh=0.30, unclip=2.2

Model weights: CC BY 4.0 — free to use, modify, and redistribute with attribution.
Code: MIT