Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26 • 57
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment Paper • 2510.07743 • Published Oct 9 • 8