WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 16 days ago • 102
Xetrieval: Mechanistically Explaining Dense Retrieval Paper • 2605.29507 • Published 27 days ago • 21
Rethinking Memory as Continuously Evolving Connectivity Paper • 2605.28773 • Published 28 days ago • 34
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 28 days ago • 431
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 274
Reinforcing Multimodal Reasoning Against Visual Degradation Paper • 2605.09262 • Published May 10 • 7
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments Paper • 2605.02240 • Published May 4 • 9
Soft Anisotropic Diagrams for Differentiable Image Representation Paper • 2604.21984 • Published Apr 27 • 5