Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels Paper • 2509.16596 • Published Sep 20, 2025 • 14
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14, 2025 • 89
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7, 2025 • 39
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric Paper • 2502.17184 • Published Feb 24, 2025 • 1