F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 48
How to use bhriguverma/bhav-tts with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
Bhav-TTS is a custom fine-tuned text-to-speech model built on top of F5-TTS (Flow Matching architecture), specialized for emotional Indian speech synthesis with Bollywood voice characteristics.
✨ Emotion-Aware Synthesis
<happy>, <sad>, <angry>, etc.)🎭 Celebrity Voice Profiles
🗣️ Multilingual Support
⚡ Production Ready
from transformers import AutoModel
import soundfile as sf
# Load model
model = AutoModel.from_pretrained("your-username/bhav-tts")
# Generate speech
text = "नमस्ते! मैं बहुत खुश हूँ। <happy>"
audio = model(text, voice="hi_male")
# Save
sf.write("output.wav", audio, 24000)
# Zero-shot voice cloning
ref_audio = "speaker_reference.wav"
ref_text = "कुछ संदर्भ टेक्स्ट"
text = "आपका जवाब यहाँ"
audio = model(text, ref_audio=ref_audio, ref_text=ref_text)
| Parameter | Value |
|---|---|
| Base Architecture | F5-TTS (ConvNeXt V2 + Transformer) |
| Dataset | Custom Bollywood emotional audio |
| Sample Count | 1,000+ utterances |
| Training Steps | 1,200,000+ |
| Checkpoints | Multi-stage saving enabled |
model_last.pt (5.1 GB) - Final trained model weightspretrained_model_1200000.pt (1.3 GB) - Intermediate checkpoint (1.2M steps)vocab.txt (13.8 KB) - Custom vocabulary/phoneme mappingIf you use Bhav-TTS in your research, please cite:
@article{chen2024f5tts,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Chen, Yushen and others},
journal={arXiv preprint arXiv:2410.06885},
year={2024}
}
MIT License
Status: ✅ Production Ready | Date: June 30, 2026