HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 1 day ago • 16
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published 6 days ago • 14
InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization Paper • 2603.13375 • Published 24 days ago • 3
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2603.18118 • Published 15 days ago • 12
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction Paper • 2603.19231 • Published 14 days ago • 36
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published 14 days ago • 42
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation Paper • 2603.16669 • Published 16 days ago • 70
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation Paper • 2603.16669 • Published 16 days ago • 70
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published 17 days ago • 152
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published 17 days ago • 152
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published 26 days ago • 84
view article Article NEO-unify: Building Native Multimodal Unified Models End to End 28 days ago • 106