OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 10 days ago • 48
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 11 days ago • 38
Experience Transfer for Multimodal LLM Agents in Minecraft Game Paper • 2604.05533 • Published 12 days ago • 15
360Anything: Geometry-Free Lifting of Images and Videos to 360° Paper • 2601.16192 • Published Jan 22 • 9