RL_Papers in general - a Nazzaroth2 Collection

Nazzaroth2 's Collections

Reward Modeling

models to test out

RL_Papers in general

OCR

VLM RL Reasoning

LLM-External_information

llm_compression

LLM_Reasoning-ErrorCorrection

Loras

3D (nerfs, gaussians, generation etc.)

t2i consistency works

videogames_roleplay

small_or_multimodal_llm

manga_translation

RL_Papers in general

updated Jun 25, 2025

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11, 2025 • 55
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11, 2025 • 28
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 188
Reasoning Models Better Express Their Confidence

Paper • 2505.14489 • Published May 20, 2025 • 20
VeriThinker: Learning to Verify Makes Reasoning Model Efficient

Paper • 2505.17941 • Published May 23, 2025 • 25
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 143
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

Paper • 2506.16141 • Published Jun 19, 2025 • 27