saken-tukenov/sozkz-corpus-clean-enkk-fineweb-edu-v1 Viewer • Updated about 1 hour ago • 15.3M • 38
saken-tukenov/sozkz-corpus-clean-enkk-fineweb-edu-v1 Viewer • Updated about 1 hour ago • 15.3M • 38
Soz: Kazakh Language Models from Scratch Collection Building foundational language models for Kazakh — models, tokenizers, and training corpora. • 20 items • Updated 4 days ago
Kazakh GEC: Grammar Error Correction Collection Kazakh grammatical error correction — 13 progressive training runs on mT5-small and mT5-base. • 19 items • Updated 4 days ago