2Mamba2Furious: Linear in Complexity, Competitive in Accuracy Paper • 2602.17363 • Published Feb 19 • 8
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 47
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper • 2511.12884 • Published Nov 17, 2025 • 27
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8, 2025 • 209
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings Paper • 2508.00632 • Published Aug 1, 2025 • 4
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24, 2025 • 41
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement Paper • 2507.18742 • Published Jul 24, 2025 • 6
view article Article Automated Discovery of High-Performance GPU Kernels with OpenEvolve Jun 27, 2025 • 25
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8, 2025 • 45