Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29, 2025 • 47
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards Paper • 2511.17473 • Published Nov 21, 2025 • 2
Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning Paper • 2512.06533 • Published Dec 6, 2025 • 7
Infinity-Parser Collection Reinforcement Learning Document Parser and High-Quality Synthetic Dataset. • 4 items • Updated Oct 27, 2025 • 1
mradermacher/Nemotron-Cascade-8B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill-GGUF 8B • Updated Dec 18, 2025 • 373 • 1
NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 13 items • Updated 2 days ago • 36