Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
metadata
title: SenseVoice Audio Transcription
emoji: ποΈ
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
Multilingual Audio Transcription with SenseVoice
This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese.
Features
- Multilingual Support: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
- Multiple Audio Sources:
- Uploaded audio files
- Direct URLs to audio files (no YouTube support due to cookie requirements)
- Model Options:
- Local SenseVoice model
- Hugging Face model:
FunAudioLLM/SenseVoiceSmall
- Advanced Features:
- Audio trimming with start/end time
- Proxy support for downloads
- Verbose logging output
- Automatic inverse text normalization (ITN)
Model Setup
For Hugging Face Spaces Deployment
The app is configured to work with:
- Local Model:
"SenseVoiceSmall"- Model files in the same directory - HF Model:
"FunAudioLLM/SenseVoiceSmall"- Auto-downloaded from Hugging Face
For Local Development
- Update
MODEL_PATH_LISTin app.py to use your custom models - Supports local paths and Hugging Face repository names
How to Use
- Upload Audio: Click "Upload or Record Audio" to select your audio file
- Select Model: Choose from available models in the dropdown
- Configure Options:
- Set start/end time for audio trimming
- Enable verbose output for debugging
- Transcribe: Click "Transcribe" to start the process
Git LFS Setup for Large Models
Since this project uses large model files, Git LFS is recommended:
# Initialize Git LFS
git lfs install
# Track large model files
git lfs track "*.bin"
git lfs track "*.safetensors"
# Add and commit
git add .gitattributes
git add .
git commit -m "Add SenseVoice model with LFS tracking"
Deployment Notes
Hugging Face Spaces
- Use
git push huggingface mainto deploy - Models are automatically cached during runtime
- First load may be slower due to model download
Model Repository Structure
your-repo/
βββ app.py
βββ README.md
βββ requirements.txt
βββ SenseVoiceSmall/ # Model directory
βββ config.json
βββ model.bin
βββ other model files...
Output
The application provides:
- Transcription Text: Full processed transcription with ITN
- Metrics: Processing time and file information
- Download: Text file with transcription results
Supported Languages
- π¨π³ Chinese (Mandarin)
- πΊπΈ English
- π―π΅ Japanese
- π°π· Korean
- ππ° Cantonese
Feedback and Contributions
Welcome feedback and contributions to improve this multilingual transcription tool.