Spaces:

FFomy
/

try_gradio

Sleeping

App Files Files Community

try_gradio / README.md

FFomy

Update README.md

650708f verified 11 days ago

preview code

raw

history blame contribute delete

2.86 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: SenseVoice Audio Transcription
emoji: 🎙️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false

Multilingual Audio Transcription with SenseVoice

This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese.

Features

Multilingual Support: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
Multiple Audio Sources:
- Uploaded audio files
- Direct URLs to audio files (no YouTube support due to cookie requirements)
Model Options:
- Local SenseVoice model
- Hugging Face model: FunAudioLLM/SenseVoiceSmall
Advanced Features:
- Audio trimming with start/end time
- Proxy support for downloads
- Verbose logging output
- Automatic inverse text normalization (ITN)

Model Setup

For Hugging Face Spaces Deployment

The app is configured to work with:

Local Model: "SenseVoiceSmall" - Model files in the same directory
HF Model: "FunAudioLLM/SenseVoiceSmall" - Auto-downloaded from Hugging Face

For Local Development

Update MODEL_PATH_LIST in app.py to use your custom models
Supports local paths and Hugging Face repository names

How to Use

Upload Audio: Click "Upload or Record Audio" to select your audio file
Select Model: Choose from available models in the dropdown
Configure Options:
- Set start/end time for audio trimming
- Enable verbose output for debugging
Transcribe: Click "Transcribe" to start the process

Git LFS Setup for Large Models

Since this project uses large model files, Git LFS is recommended:

# Initialize Git LFS
git lfs install

# Track large model files
git lfs track "*.bin"
git lfs track "*.safetensors"

# Add and commit
git add .gitattributes
git add .
git commit -m "Add SenseVoice model with LFS tracking"

Deployment Notes

Hugging Face Spaces

Use git push huggingface main to deploy
Models are automatically cached during runtime
First load may be slower due to model download

Model Repository Structure

your-repo/
├── app.py
├── README.md
├── requirements.txt
└── SenseVoiceSmall/  # Model directory
    ├── config.json
    ├── model.bin
    └── other model files...

Output

The application provides:

Transcription Text: Full processed transcription with ITN
Metrics: Processing time and file information
Download: Text file with transcription results

Supported Languages

🇨🇳 Chinese (Mandarin)
🇺🇸 English
🇯🇵 Japanese
🇰🇷 Korean
🇭🇰 Cantonese

Feedback and Contributions

Welcome feedback and contributions to improve this multilingual transcription tool.