Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings Paper • 2503.10446 • Published Mar 13
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction Paper • 2506.05899 • Published Jun 6