Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 7 days ago • 12
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data Paper • 2604.14164 • Published 29 days ago • 34
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization Paper • 2511.15915 • Published 6 days ago • 2
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published 18 days ago • 231
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 18 days ago • 363
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 19 days ago • 481
Kronos: A Foundation Model for the Language of Financial Markets Paper • 2508.02739 • Published Aug 2, 2025 • 24
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published Mar 19 • 66