VL-JEPA: Joint Embedding Predictive Architecture for Vision-language Paper • 2512.10942 • Published 20 days ago • 17
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement Paper • 2512.21185 • Published 7 days ago • 14
SpotEdit: Selective Region Editing in Diffusion Transformers Paper • 2512.22323 • Published 5 days ago • 32
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 4 days ago • 34
DiRL: An Efficient Post-Training Framework for Diffusion Language Models Paper • 2512.22234 • Published 9 days ago • 16
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 5 days ago • 53
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 2 days ago • 81
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding Paper • 2512.21643 • Published 6 days ago • 10
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 12 days ago • 92
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding Paper • 2512.17220 • Published 13 days ago • 87
InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published about 1 month ago • 35
From Word to World: Can Large Language Models be Implicit Text-based World Models? Paper • 2512.18832 • Published 10 days ago • 11
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios Paper • 2512.18470 • Published 11 days ago • 9
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 14 days ago • 28
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 8 days ago • 57