Closing the Train-Test Gap in World Models for Gradient-Based Planning Paper • 2512.09929 • Published Dec 10, 2025
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Paper • 2501.18837 • Published Jan 31, 2025 • 10