InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published 6 days ago • 12
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 6 days ago • 198
Pause or Fabricate? Training Language Models for Grounded Reasoning Paper • 2604.19656 • Published 15 days ago • 10
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Paper • 2604.14113 • Published 21 days ago • 10
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published 21 days ago • 63
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 23 days ago • 143
AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents Paper • 2603.27490 • Published Mar 29 • 18
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published 27 days ago • 47
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 29 days ago • 119
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published Apr 2 • 100
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills Paper • 2603.25158 • Published Mar 26 • 52
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 54
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning Paper • 2603.15611 • Published Mar 16 • 10
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory Paper • 2603.03269 • Published Mar 3 • 63