Kyutai

non-profit

Verified

https://kyutai.org/

kyutai_labs

kyutai-labs

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

HippolyteP submitted a paper about 12 hours ago

Understanding Data Temporality Impact on Large Language Models Pre-training

HippolyteP updated a collection 1 day ago

Kairos

rfbr published a dataset 1 day ago

kyutai/KairosQA

View all activity

Papers

Understanding Data Temporality Impact on Large Language Models Pre-training

One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

View all Papers

kyutai 's collections 11

MoshiRAG Release

Candle & PyTorch model checkpoints released as part of the MoshiRAG release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi-rag

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

Paper • 2604.12928 • Published Apr 14
kyutai/moshika-rag-pytorch-bf16

Audio-to-Audio • 8B • Updated Apr 17 • 742 • 6
kyutai/moshika-rag-candle-bf16

Audio-to-Audio • 8B • Updated Apr 17 • 454 • 8

Kairos

Temporal pretraining checkpoints and KairosQA evaluation dataset

kyutai/KairosQA

Viewer • Updated 1 day ago • 7.17k • 25
Understanding Data Temporality Impact on Large Language Models Pre-training

Paper • 2605.22769 • Published 7 days ago • 2
kyutai/Sequential_Helium_6B

Text Generation • 6B • Updated 1 day ago • 282

ARC-Encoders

Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs.

ARC-Encoder: learning compressed text representations for large language models

Paper • 2510.20535 • Published Oct 23, 2025 • 8
kyutai/ARC8_Encoder_Llama

Feature Extraction • Updated Nov 5, 2025 • 11 • 2
kyutai/ARC_finetuning

Preview • Updated Oct 24, 2025 • 31
kyutai/ARC8_Encoder_multi

Feature Extraction • Updated Nov 5, 2025 • 12 • 6

Speech-To-Text

https://kyutai.org/next/stt

kyutai/stt-2.6b-en

Automatic Speech Recognition • 3B • Updated Jun 26, 2025 • 122
kyutai/stt-1b-en_fr

Automatic Speech Recognition • 1.0B • Updated Nov 18, 2025 • 126
kyutai/stt-1b-en_fr-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 5
kyutai/stt-2.6b-en-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 8

MoshiVis v0.1

MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs

Vision-Speech Models: Teaching Speech Models to Converse about Images

Paper • 2503.15633 • Published Mar 19, 2025 • 2
kyutai/Babillage

Viewer • Updated Mar 21, 2025 • 465k • 636 • 13
kyutai/moshika-vis-pytorch-bf16

Updated Jun 18, 2025 • 58
kyutai/moshika-vis-candle-bf16

Updated Mar 18, 2025 • 1

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi

Moshi: a speech-text foundation model for real-time dialogue

Paper • 2410.00037 • Published Sep 17, 2024 • 17
kyutai/moshiko-pytorch-bf16

Updated Sep 18, 2024 • 181k • 242
kyutai/moshika-pytorch-bf16

Updated Sep 18, 2024 • 4.58k • 60
kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2, 2025 • 2.4M • • 302

Hibiki-Zero

Streaming speech translation without the need for word-level alignments

Running

12

Hibiki Zero Samples

🏆

12

Demo samples of the speech translation model Hibiki-Zero.
Simultaneous Speech-to-Speech Translation Without Aligned Data

Paper • 2602.11072 • Published Feb 11 • 1
kyutai/Audio-NTREX-4L

Viewer • Updated Feb 12 • 3.6k • 734 • 3
kyutai/hibiki-zero-3b-pytorch-bf16

Audio-to-Audio • Updated Feb 12 • 2.05k • 53

CASA

CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion on long-context streaming inputs

Running

Agents

3

CASA Gallery

🏠

3

Video Gallery for CASA: Cross-Attention over Self-Attention
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12
kyutai/CASA-Helium1-VL-2B

Image-Text-to-Text • 3B • Updated Mar 9 • 28 • 8
kyutai/CASA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated Dec 23, 2025 • 127 • 2

Text-To-Speech

https://kyutai.org/next/tts

kyutai/pocket-tts

Updated 23 days ago • 5.04k • 636
kyutai/pocket-tts-without-voice-cloning

Updated 23 days ago • 7.27k • 24
kyutai/tts-1.6b-en_fr

Text-to-Speech • Updated Sep 11, 2025 • 172k • 378
kyutai/tts-voices

Updated Mar 9 • 154

Helium 1

Helium 1: a modular and multilingual LLM

kyutai/helium-1-2b

Text Generation • 2B • Updated Apr 30, 2025 • 13.1k • 54
kyutai/helium-1-2b-books

Text Generation • 2B • Updated Apr 30, 2025 • 7 • 1
kyutai/helium-1-2b-hum

Text Generation • 2B • Updated Apr 30, 2025 • 13
kyutai/helium-1-2b-life

Text Generation • 2B • Updated Apr 30, 2025 • 10 • 1

Hibiki fr-en

Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki.

Running

53

Hibiki Samples

🤗

53

Translate speech in real-time with high fidelity
High-Fidelity Simultaneous Speech-To-Speech Translation

Paper • 2502.03382 • Published Feb 5, 2025 • 8
kyutai/hibiki-1b-mlx-bf16

Translation • Updated Feb 6, 2025 • 128 • 30
kyutai/hibiki-2b-mlx-bf16

Translation • Updated Feb 6, 2025 • 18 • 22

MoshiRAG Release

Candle & PyTorch model checkpoints released as part of the MoshiRAG release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi-rag

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

Paper • 2604.12928 • Published Apr 14
kyutai/moshika-rag-pytorch-bf16

Audio-to-Audio • 8B • Updated Apr 17 • 742 • 6
kyutai/moshika-rag-candle-bf16

Audio-to-Audio • 8B • Updated Apr 17 • 454 • 8

Hibiki-Zero

Streaming speech translation without the need for word-level alignments

Running

12

Hibiki Zero Samples

🏆

12

Demo samples of the speech translation model Hibiki-Zero.
Simultaneous Speech-to-Speech Translation Without Aligned Data

Paper • 2602.11072 • Published Feb 11 • 1
kyutai/Audio-NTREX-4L

Viewer • Updated Feb 12 • 3.6k • 734 • 3
kyutai/hibiki-zero-3b-pytorch-bf16

Audio-to-Audio • Updated Feb 12 • 2.05k • 53

Kairos

Temporal pretraining checkpoints and KairosQA evaluation dataset

kyutai/KairosQA

Viewer • Updated 1 day ago • 7.17k • 25
Understanding Data Temporality Impact on Large Language Models Pre-training

Paper • 2605.22769 • Published 7 days ago • 2
kyutai/Sequential_Helium_6B

Text Generation • 6B • Updated 1 day ago • 282

CASA

CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion on long-context streaming inputs

Running

Agents

3

CASA Gallery

🏠

3

Video Gallery for CASA: Cross-Attention over Self-Attention
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12
kyutai/CASA-Helium1-VL-2B

Image-Text-to-Text • 3B • Updated Mar 9 • 28 • 8
kyutai/CASA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated Dec 23, 2025 • 127 • 2

ARC-Encoders

Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs.

ARC-Encoder: learning compressed text representations for large language models

Paper • 2510.20535 • Published Oct 23, 2025 • 8
kyutai/ARC8_Encoder_Llama

Feature Extraction • Updated Nov 5, 2025 • 11 • 2
kyutai/ARC_finetuning

Preview • Updated Oct 24, 2025 • 31
kyutai/ARC8_Encoder_multi

Feature Extraction • Updated Nov 5, 2025 • 12 • 6

Text-To-Speech

https://kyutai.org/next/tts

kyutai/pocket-tts

Updated 23 days ago • 5.04k • 636
kyutai/pocket-tts-without-voice-cloning

Updated 23 days ago • 7.27k • 24
kyutai/tts-1.6b-en_fr

Text-to-Speech • Updated Sep 11, 2025 • 172k • 378
kyutai/tts-voices

Updated Mar 9 • 154

Speech-To-Text

https://kyutai.org/next/stt

kyutai/stt-2.6b-en

Automatic Speech Recognition • 3B • Updated Jun 26, 2025 • 122
kyutai/stt-1b-en_fr

Automatic Speech Recognition • 1.0B • Updated Nov 18, 2025 • 126
kyutai/stt-1b-en_fr-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 5
kyutai/stt-2.6b-en-mlx

Automatic Speech Recognition • Updated Jun 19, 2025 • 8

Helium 1

Helium 1: a modular and multilingual LLM

kyutai/helium-1-2b

Text Generation • 2B • Updated Apr 30, 2025 • 13.1k • 54
kyutai/helium-1-2b-books

Text Generation • 2B • Updated Apr 30, 2025 • 7 • 1
kyutai/helium-1-2b-hum

Text Generation • 2B • Updated Apr 30, 2025 • 13
kyutai/helium-1-2b-life

Text Generation • 2B • Updated Apr 30, 2025 • 10 • 1

MoshiVis v0.1

MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs

Vision-Speech Models: Teaching Speech Models to Converse about Images

Paper • 2503.15633 • Published Mar 19, 2025 • 2
kyutai/Babillage

Viewer • Updated Mar 21, 2025 • 465k • 636 • 13
kyutai/moshika-vis-pytorch-bf16

Updated Jun 18, 2025 • 58
kyutai/moshika-vis-candle-bf16

Updated Mar 18, 2025 • 1

Hibiki fr-en

Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki.

Running

53

Hibiki Samples

🤗

53

Translate speech in real-time with high fidelity
High-Fidelity Simultaneous Speech-To-Speech Translation

Paper • 2502.03382 • Published Feb 5, 2025 • 8
kyutai/hibiki-1b-mlx-bf16

Translation • Updated Feb 6, 2025 • 128 • 30
kyutai/hibiki-2b-mlx-bf16

Translation • Updated Feb 6, 2025 • 18 • 22

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi

Moshi: a speech-text foundation model for real-time dialogue

Paper • 2410.00037 • Published Sep 17, 2024 • 17
kyutai/moshiko-pytorch-bf16

Updated Sep 18, 2024 • 181k • 242
kyutai/moshika-pytorch-bf16

Updated Sep 18, 2024 • 4.58k • 60
kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2, 2025 • 2.4M • • 302

AI & ML interests

Recent Activity

Papers

Team members 17

kyutai 's collections 11

Hibiki Zero Samples

CASA Gallery

Hibiki Samples

Hibiki Zero Samples

CASA Gallery

Hibiki Samples