From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company Paper • 2604.22446 • Published 8 days ago • 114
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published 23 days ago • 51
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 19 days ago • 141
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 23 days ago • 244
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published 27 days ago • 50
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 30 days ago • 884
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 131
4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video Paper • 2603.21618 • Published Mar 23 • 15
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published Mar 16 • 21
Make it SING: Analyzing Semantic Invariants in Classifiers Paper • 2603.14610 • Published Mar 15 • 16
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published Mar 3 • 146
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory Paper • 2603.03269 • Published Mar 3 • 63
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published Mar 8 • 86
EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding Paper • 2603.04254 • Published Mar 4 • 1