view changelog Changelog Organization and User profiles now include repository listing pages Jun 20, 2025 • 131
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 46
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 73
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5, 2025 • 133
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published Jun 12, 2025 • 73
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11, 2025 • 101
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 273
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23, 2025 • 81
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23, 2025 • 88
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26, 2025 • 92