Spaces:
Sleeping
Sleeping
| title: SenseVoice Audio Transcription | |
| emoji: ποΈ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.0.2 | |
| app_file: app.py | |
| pinned: false | |
| # Multilingual Audio Transcription with SenseVoice | |
| This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese. | |
| ## Features | |
| - **Multilingual Support**: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue) | |
| - **Multiple Audio Sources**: | |
| - Uploaded audio files | |
| - Direct URLs to audio files (no YouTube support due to cookie requirements) | |
| - **Model Options**: | |
| - Local SenseVoice model | |
| - Hugging Face model: `FunAudioLLM/SenseVoiceSmall` | |
| - **Advanced Features**: | |
| - Audio trimming with start/end time | |
| - Proxy support for downloads | |
| - Verbose logging output | |
| - Automatic inverse text normalization (ITN) | |
| ## Model Setup | |
| ### For Hugging Face Spaces Deployment | |
| The app is configured to work with: | |
| 1. **Local Model**: `"SenseVoiceSmall"` - Model files in the same directory | |
| 2. **HF Model**: `"FunAudioLLM/SenseVoiceSmall"` - Auto-downloaded from Hugging Face | |
| ### For Local Development | |
| - Update `MODEL_PATH_LIST` in app.py to use your custom models | |
| - Supports local paths and Hugging Face repository names | |
| ## How to Use | |
| 1. **Upload Audio**: Click "Upload or Record Audio" to select your audio file | |
| 2. **Select Model**: Choose from available models in the dropdown | |
| 3. **Configure Options**: | |
| - Set start/end time for audio trimming | |
| - Enable verbose output for debugging | |
| 4. **Transcribe**: Click "Transcribe" to start the process | |
| ## Git LFS Setup for Large Models | |
| Since this project uses large model files, Git LFS is recommended: | |
| ```bash | |
| # Initialize Git LFS | |
| git lfs install | |
| # Track large model files | |
| git lfs track "*.bin" | |
| git lfs track "*.safetensors" | |
| # Add and commit | |
| git add .gitattributes | |
| git add . | |
| git commit -m "Add SenseVoice model with LFS tracking" | |
| ``` | |
| ## Deployment Notes | |
| ### Hugging Face Spaces | |
| - Use `git push huggingface main` to deploy | |
| - Models are automatically cached during runtime | |
| - First load may be slower due to model download | |
| ### Model Repository Structure | |
| ``` | |
| your-repo/ | |
| βββ app.py | |
| βββ README.md | |
| βββ requirements.txt | |
| βββ SenseVoiceSmall/ # Model directory | |
| βββ config.json | |
| βββ model.bin | |
| βββ other model files... | |
| ``` | |
| ## Output | |
| The application provides: | |
| - **Transcription Text**: Full processed transcription with ITN | |
| - **Metrics**: Processing time and file information | |
| - **Download**: Text file with transcription results | |
| ## Supported Languages | |
| - π¨π³ Chinese (Mandarin) | |
| - πΊπΈ English | |
| - π―π΅ Japanese | |
| - π°π· Korean | |
| - ππ° Cantonese | |
| ## Feedback and Contributions | |
| Welcome feedback and contributions to improve this multilingual transcription tool. |