Zehnova STT β O'zbek tili uchun Speech-to-Text modeli
O'zbek tili uchun fine-tune qilingan Whisper Medium asosidagi avtomatik nutqni matnΠ³Π° aylantirish modeli.
Model haqida
- Model turi: Automatic Speech Recognition (ASR)
- Asos model:
Kotib/uzbek_stt_v1(Whisper Medium) - Fine-tuning usuli: LoRA (Low-Rank Adaptation)
- Til: O'zbek tili πΊπΏ
- Muallif: Jonibek21
Ishlatish
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
import torch
model_id = "Jonibek21/Zehnova-stt-uzbek"
model = WhisperForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16
).to("cuda")
processor = WhisperProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
stride_length_s=5,
batch_size=4,
device=0,
)
result = pipe(
"audio.wav",
generate_kwargs={
"language": "uz",
"task": "transcribe",
"no_repeat_ngram_size": 3
}
)
print(result["text"])
Training ma'lumotlari
- Dataset: Maxsus O'zbek tili audio dataseti
- Train samples: 9,214
- Test samples: 1,024
- Dataset vaqti: 16 soat
- Training hardware: NVIDIA RTX 3090 (24GB)
- Training framework: Hugging Face Transformers + PEFT
- Precision: fp16
- LoRA rank: 32
- LoRA alpha: 64
- LoRA target modules: q_proj, v_proj
π Model Evaluation (WER)
| Category | WER |
|---|---|
| Overall | ~11-13% |
| Clean Speech | ~6-11% |
| Noisy/Augme | ~9-16% |
| News / Formal | ~11-12% |
Base model (Kotib/uzbek_stt_v1) overall WER: 16.7% Zehnova modeli base modeldan ~5% yaxshiroq natija ko'rsatdi.
Cheklovlar
- Faqat o'zbek tilida ishlaydi
- Shovqinli audio da sifat pasayishi mumkin
- 30 soniyadan uzun audiolar bo'laklarga bo'linadi
Date
- 01/05/2026
- Downloads last month
- 18