1 36 32

Zhisheng Zheng

zhisheng01

https://zhishengzheng.com/

zhisheng147

AI & ML interests

LLM, Speech and Audio Processing

Recent Activity

liked a dataset 1 day ago

SparkAudio/voxbox

upvoted a paper about 1 month ago

VIDEOP2R: Video Understanding from Perception to Reasoning

upvoted a paper 2 months ago

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

View all activity

Organizations

None yet

upvoted a paper about 1 month ago

VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 112

upvoted a paper 2 months ago

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

Paper • 2510.24693 • Published Oct 28, 2025 • 18

upvoted 2 papers 3 months ago

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Paper • 2510.00515 • Published Oct 1, 2025 • 39

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Paper • 2509.22220 • Published Sep 26, 2025 • 65

upvoted 2 papers 4 months ago

Beyond Transcription: Mechanistic Interpretability in ASR

Paper • 2508.15882 • Published Aug 21, 2025 • 86

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 138

upvoted a paper 5 months ago

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Paper • 2508.11598 • Published Aug 15, 2025 • 17

upvoted a paper 6 months ago

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Paper • 2506.17450 • Published Jun 20, 2025 • 64

upvoted a paper 8 months ago

Kimi-Audio Technical Report

Paper • 2504.18425 • Published Apr 25, 2025 • 20

upvoted 4 papers 10 months ago

upvoted 3 papers 11 months ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18, 2025 • 86

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Paper • 2502.05176 • Published Feb 7, 2025 • 39

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6, 2025 • 27

upvoted an article 11 months ago

Article

Recipe: Preparing Multilingual Speech Datasets for TTS Training

Nov 4, 2024

•

upvoted a paper 12 months ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10, 2025 • 52

upvoted 2 papers about 1 year ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 46

Zhisheng Zheng

AI & ML interests

Recent Activity

Organizations

zhisheng01's activity

Recipe: Preparing Multilingual Speech Datasets for TTS Training