try_gradio / README.md
FFomy's picture
Update README.md
650708f verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: SenseVoice Audio Transcription
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false

Multilingual Audio Transcription with SenseVoice

This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese.

Features

  • Multilingual Support: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
  • Multiple Audio Sources:
    • Uploaded audio files
    • Direct URLs to audio files (no YouTube support due to cookie requirements)
  • Model Options:
    • Local SenseVoice model
    • Hugging Face model: FunAudioLLM/SenseVoiceSmall
  • Advanced Features:
    • Audio trimming with start/end time
    • Proxy support for downloads
    • Verbose logging output
    • Automatic inverse text normalization (ITN)

Model Setup

For Hugging Face Spaces Deployment

The app is configured to work with:

  1. Local Model: "SenseVoiceSmall" - Model files in the same directory
  2. HF Model: "FunAudioLLM/SenseVoiceSmall" - Auto-downloaded from Hugging Face

For Local Development

  • Update MODEL_PATH_LIST in app.py to use your custom models
  • Supports local paths and Hugging Face repository names

How to Use

  1. Upload Audio: Click "Upload or Record Audio" to select your audio file
  2. Select Model: Choose from available models in the dropdown
  3. Configure Options:
    • Set start/end time for audio trimming
    • Enable verbose output for debugging
  4. Transcribe: Click "Transcribe" to start the process

Git LFS Setup for Large Models

Since this project uses large model files, Git LFS is recommended:

# Initialize Git LFS
git lfs install

# Track large model files
git lfs track "*.bin"
git lfs track "*.safetensors"

# Add and commit
git add .gitattributes
git add .
git commit -m "Add SenseVoice model with LFS tracking"

Deployment Notes

Hugging Face Spaces

  • Use git push huggingface main to deploy
  • Models are automatically cached during runtime
  • First load may be slower due to model download

Model Repository Structure

your-repo/
β”œβ”€β”€ app.py
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
└── SenseVoiceSmall/  # Model directory
    β”œβ”€β”€ config.json
    β”œβ”€β”€ model.bin
    └── other model files...

Output

The application provides:

  • Transcription Text: Full processed transcription with ITN
  • Metrics: Processing time and file information
  • Download: Text file with transcription results

Supported Languages

  • πŸ‡¨πŸ‡³ Chinese (Mandarin)
  • πŸ‡ΊπŸ‡Έ English
  • πŸ‡―πŸ‡΅ Japanese
  • πŸ‡°πŸ‡· Korean
  • πŸ‡­πŸ‡° Cantonese

Feedback and Contributions

Welcome feedback and contributions to improve this multilingual transcription tool.