LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Paper • 2403.07974 • Published Mar 12, 2024 • 4
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 15
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects Paper • 2502.12404 • Published Feb 18, 2025 • 1
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer Paper • 2502.21228 • Published Feb 28, 2025 • 4