try_gradio / README.md
FFomy's picture
Update README.md
650708f verified
---
title: SenseVoice Audio Transcription
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
---
# Multilingual Audio Transcription with SenseVoice
This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese.
## Features
- **Multilingual Support**: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
- **Multiple Audio Sources**:
- Uploaded audio files
- Direct URLs to audio files (no YouTube support due to cookie requirements)
- **Model Options**:
- Local SenseVoice model
- Hugging Face model: `FunAudioLLM/SenseVoiceSmall`
- **Advanced Features**:
- Audio trimming with start/end time
- Proxy support for downloads
- Verbose logging output
- Automatic inverse text normalization (ITN)
## Model Setup
### For Hugging Face Spaces Deployment
The app is configured to work with:
1. **Local Model**: `"SenseVoiceSmall"` - Model files in the same directory
2. **HF Model**: `"FunAudioLLM/SenseVoiceSmall"` - Auto-downloaded from Hugging Face
### For Local Development
- Update `MODEL_PATH_LIST` in app.py to use your custom models
- Supports local paths and Hugging Face repository names
## How to Use
1. **Upload Audio**: Click "Upload or Record Audio" to select your audio file
2. **Select Model**: Choose from available models in the dropdown
3. **Configure Options**:
- Set start/end time for audio trimming
- Enable verbose output for debugging
4. **Transcribe**: Click "Transcribe" to start the process
## Git LFS Setup for Large Models
Since this project uses large model files, Git LFS is recommended:
```bash
# Initialize Git LFS
git lfs install
# Track large model files
git lfs track "*.bin"
git lfs track "*.safetensors"
# Add and commit
git add .gitattributes
git add .
git commit -m "Add SenseVoice model with LFS tracking"
```
## Deployment Notes
### Hugging Face Spaces
- Use `git push huggingface main` to deploy
- Models are automatically cached during runtime
- First load may be slower due to model download
### Model Repository Structure
```
your-repo/
β”œβ”€β”€ app.py
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
└── SenseVoiceSmall/ # Model directory
β”œβ”€β”€ config.json
β”œβ”€β”€ model.bin
└── other model files...
```
## Output
The application provides:
- **Transcription Text**: Full processed transcription with ITN
- **Metrics**: Processing time and file information
- **Download**: Text file with transcription results
## Supported Languages
- πŸ‡¨πŸ‡³ Chinese (Mandarin)
- πŸ‡ΊπŸ‡Έ English
- πŸ‡―πŸ‡΅ Japanese
- πŸ‡°πŸ‡· Korean
- πŸ‡­πŸ‡° Cantonese
## Feedback and Contributions
Welcome feedback and contributions to improve this multilingual transcription tool.