Global PIQA Collection A physical commonsense reasoning benchmark for 100+ languages, written in collaboration with 300+ researchers from 65 countries. • 4 items • Updated 5 days ago • 3
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data Paper • 2601.18026 • Published Jan 25
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 80
Multilingual Leaderboards Collection Leaderboards for languages other than English • 20 items • Updated May 7