MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 9 days ago • 47
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper • 2601.07779 • Published 15 days ago • 27
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 17 days ago • 50
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 38
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 72
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper • 2510.23763 • Published Oct 27, 2025 • 55
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 28
Training Language Models to Generate Quality Code with Program Analysis Feedback Paper • 2505.22704 • Published May 28, 2025 • 14
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
The Role of Summarization in Generative Agents: A Preliminary Perspective Paper • 2305.01253 • Published May 2, 2023 • 1
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published May 31, 2025 • 30