Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12 • 201
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27 • 52
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published Oct 22 • 68
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published Oct 22 • 60
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published Oct 22 • 29
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper • 2510.15742 • Published Oct 17 • 50
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published Oct 23 • 18
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives Paper • 2510.20822 • Published Oct 23 • 40
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall Paper • 2510.19304 • Published Oct 22 • 23
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7 • 141
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis Paper • 2405.09814 • Published May 16, 2024
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents Paper • 2303.14613 • Published Mar 26, 2023
Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents Paper • 2510.04637 • Published Oct 6
fixie-ai/ultravox-v0_5-llama-3_2-1b Audio-Text-to-Text • 0.7B • Updated about 1 month ago • 381k • 67