Spaces:

FFomy
/

try_gradio

Sleeping

App Files Files Community

try_gradio / README.md

FFomy

Update README.md

650708f verified 12 days ago

preview code

raw

history blame contribute delete

2.86 kB

	---
	title: SenseVoice Audio Transcription
	emoji: 🎙️
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 6.0.2
	app_file: app.py
	pinned: false
	---

	# Multilingual Audio Transcription with SenseVoice

	This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese.

	## Features

	- Multilingual Support: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
	- Multiple Audio Sources:
	- Uploaded audio files
	- Direct URLs to audio files (no YouTube support due to cookie requirements)
	- Model Options:
	- Local SenseVoice model
	- Hugging Face model: `FunAudioLLM/SenseVoiceSmall`
	- Advanced Features:
	- Audio trimming with start/end time
	- Proxy support for downloads
	- Verbose logging output
	- Automatic inverse text normalization (ITN)

	## Model Setup

	### For Hugging Face Spaces Deployment
	The app is configured to work with:
	1. Local Model: `"SenseVoiceSmall"` - Model files in the same directory
	2. HF Model: `"FunAudioLLM/SenseVoiceSmall"` - Auto-downloaded from Hugging Face

	### For Local Development
	- Update `MODEL_PATH_LIST` in app.py to use your custom models
	- Supports local paths and Hugging Face repository names

	## How to Use

	1. Upload Audio: Click "Upload or Record Audio" to select your audio file
	2. Select Model: Choose from available models in the dropdown
	3. Configure Options:
	- Set start/end time for audio trimming
	- Enable verbose output for debugging
	4. Transcribe: Click "Transcribe" to start the process

	## Git LFS Setup for Large Models

	Since this project uses large model files, Git LFS is recommended:

	```bash
	# Initialize Git LFS
	git lfs install

	# Track large model files
	git lfs track "*.bin"
	git lfs track "*.safetensors"

	# Add and commit
	git add .gitattributes
	git add .
	git commit -m "Add SenseVoice model with LFS tracking"
	```

	## Deployment Notes

	### Hugging Face Spaces
	- Use `git push huggingface main` to deploy
	- Models are automatically cached during runtime
	- First load may be slower due to model download

	### Model Repository Structure
	```
	your-repo/
	├── app.py
	├── README.md
	├── requirements.txt
	└── SenseVoiceSmall/ # Model directory
	├── config.json
	├── model.bin
	└── other model files...
	```

	## Output

	The application provides:
	- Transcription Text: Full processed transcription with ITN
	- Metrics: Processing time and file information
	- Download: Text file with transcription results

	## Supported Languages

	- 🇨🇳 Chinese (Mandarin)
	- 🇺🇸 English
	- 🇯🇵 Japanese
	- 🇰🇷 Korean
	- 🇭🇰 Cantonese

	## Feedback and Contributions

	Welcome feedback and contributions to improve this multilingual transcription tool.