Candle & PyTorch model checkpoints released as part of the MoshiRAG release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi-rag
Kyutai
non-profit
Verified
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Understanding Data Temporality Impact on Large Language Models Pre-training
One View Is Enough! Monocular Training for In-the-Wild Novel View Generation
Temporal pretraining checkpoints and KairosQA evaluation dataset
Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs.
https://kyutai.org/next/stt
MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs
MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi
Streaming speech translation without the need for word-level alignments
-
Hibiki Zero Samples
🏆12Demo samples of the speech translation model Hibiki-Zero.
-
Simultaneous Speech-to-Speech Translation Without Aligned Data
Paper • 2602.11072 • Published • 1 -
kyutai/Audio-NTREX-4L
Viewer • Updated • 3.6k • 734 • 3 -
kyutai/hibiki-zero-3b-pytorch-bf16
Audio-to-Audio • Updated • 2.05k • 53
CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion on long-context streaming inputs
-
CASA Gallery
🏠3Video Gallery for CASA: Cross-Attention over Self-Attention
-
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper • 2512.19535 • Published • 12 -
kyutai/CASA-Helium1-VL-2B
Image-Text-to-Text • 3B • Updated • 28 • 8 -
kyutai/CASA-Qwen2_5-VL-3B
Image-Text-to-Text • 4B • Updated • 127 • 2
https://kyutai.org/next/tts
Helium 1: a modular and multilingual LLM
Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki.
Candle & PyTorch model checkpoints released as part of the MoshiRAG release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi-rag
Streaming speech translation without the need for word-level alignments
-
Hibiki Zero Samples
🏆12Demo samples of the speech translation model Hibiki-Zero.
-
Simultaneous Speech-to-Speech Translation Without Aligned Data
Paper • 2602.11072 • Published • 1 -
kyutai/Audio-NTREX-4L
Viewer • Updated • 3.6k • 734 • 3 -
kyutai/hibiki-zero-3b-pytorch-bf16
Audio-to-Audio • Updated • 2.05k • 53
Temporal pretraining checkpoints and KairosQA evaluation dataset
CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion on long-context streaming inputs
-
CASA Gallery
🏠3Video Gallery for CASA: Cross-Attention over Self-Attention
-
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper • 2512.19535 • Published • 12 -
kyutai/CASA-Helium1-VL-2B
Image-Text-to-Text • 3B • Updated • 28 • 8 -
kyutai/CASA-Qwen2_5-VL-3B
Image-Text-to-Text • 4B • Updated • 127 • 2
Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs.
https://kyutai.org/next/tts
https://kyutai.org/next/stt
Helium 1: a modular and multilingual LLM
MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs
Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki.
MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi