Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published 5 days ago • 39
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 75
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks Paper • 2509.01396 • Published Sep 1, 2025 • 57