Whisper Small Model โ Indonesian ASR
Model Description
This model is a fine-tuned version of openai/whisper-small for Automatic Speech Recognition (ASR) in Indonesian (id).
It supports transcription of Indonesian speech into text across various audio conditions, with performance and resource usage depending on the selected model size.
Intended Use
- Indonesian speech-to-text transcription
- Research and experimentation
- Educational and academic purposes
- Application development and benchmarking
Model variants (tiny, base, small, medium, large) differ in accuracy, speed, and hardware requirements. Users should select the size that best matches their constraints and objectives.
Limitations
- Transcription quality depends on audio clarity, speaker accent, and background noise
- Smaller variants may produce higher error rates on long or complex audio
- Larger variants require significantly more compute and memory
- Outputs should be reviewed before use in critical or high-risk applications
Training Data
This model was fine-tuned using Mozilla Common Voice v23.0 (Indonesian).
Common Voice is a publicly available, community-driven speech dataset released by Mozilla under a permissive license.
Dataset characteristics such as speaker diversity, recording quality, and utterance length may influence model behavior.
Evaluation
The model is typically evaluated using Word Error Rate (WER).
Evaluation results may vary depending on dataset, domain, audio conditions, and model size.
Training results
| Step | Training Loss |
|---|---|
| 100 | 0.897100 |
| 200 | 0.509400 |
| 300 | 0.234200 |
| 400 | 0.153100 |
| 500 | 0.068000 |
| 600 | 0.074100 |
| 700 | 0.029100 |
| 800 | 0.017800 |
| 900 | 0.013600 |
| 1000 | 0.007200 |
| 1100 | 0.004900 |
| 1200 | 0.003700 |
| 1300 | 0.001800 |
| 1400 | 0.001700 |
| 1500 | 0.001100 |
| 1550 | 0.001100 |
- Downloads last month
- 30
Model tree for Sparkplugx1904/whisper-small-id
Base model
openai/whisper-small